Sorry, looks like you had already mentioned the hive version.
Hive 2.1.1 and Hive 1.1.0-cdh5.15.1
On Monday, March 4, 2019, 10:33:27 PM PST, [email protected]
<[email protected]> wrote:
Hmmm, Looks like the query does filter on partition field.
Not sure why Hive is passing the partition_columns.
BTW, Which version of hive are you using ?
Also, while the query succeeds, Does the count (and result in general) looks
correct ?
Anyways, AbstractRealtimeRecordReader comes into picture only when reading
records from a split. The the partition projection/pruning outside of this
class (in hive). So, this should be ok.
Balaji.V On Sunday, March 3, 2019, 7:59:12 PM PST, kaka chen
<[email protected]> wrote:
Hi Balaji,
Sorry for late response.
I used this query to test:select count(*), par, dashboard_id from
dev.statistics_dashboard_visitor_hudi_rt where id = 20190 and par = '20190226'
group by dashboard_id ;
Thanks,Frank
[email protected] <[email protected]> 于2019年3月1日周五 上午10:27写道:
Hi Kaka,
I see. The output of "describe formatted <table>" formatting got messed up in
the email and I missed the partitioning column.
Curious, How the query looks like. Is this a hive query ? Can you paste it ?
Balaji.V On Wednesday, February 27, 2019, 6:18:38 PM PST, kaka chen
<[email protected]> wrote:
Hi Balaji,
Thanks. But the table I used is partitioned by par. It cannot been
distinguished.
Thanks,
Kaka
Balaji Varadarajan <[email protected]> 于2019年2月28日周四 上午1:26写道:
> Hi Kaka,
> Yes, this is expected as the table is non-partitioned. The 0.4.5 release
> which happened yesterday has the fix you referenced.
> Thanks,Balaji.V
>
>
> On Tuesday, February 26, 2019, 6:37:59 PM PST, kaka chen <
> [email protected]> wrote:
>
> BTW, because it cannot get partition field, after I merged with
> https://github.com/uber/hudi/pull/569/files, the job can run successfully.
>
> Thanks,
> Frank
>
> kaka chen <[email protected]> 于2019年2月27日周三 上午10:34写道:
>
> >
> > I have tried two environments(Hive 2.1.1 and Hive 1.1.0-cdh5.15.1) both
> > cannot get the partition field.
> >
> > And I added simple logs to show the result:
> >
> > LOG.info("schema: " + schema + " partitioningFields: " +
> partitioningFields);
> >
> > 2019-02-26 19:53:47,855 INFO [main]
> com.uber.hoodie.hadoop.realtime.AbstractRealtimeRecordReader: schema:
> {"type":"record","name":"test_record","namespace":"hoodie.test","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"default":null},{"name":"_hoodie_record_key","type":["null","string"],"default":null},{"name":"_hoodie_partition_path","type":["null","string"],"default":null},{"name":"_hoodie_file_name","type":["null","string"],"default":null},{"name":"id","type":["null","int"],"default":null},{"name":"user_id","type":["null","int"],"default":null},{"name":"dashboard_id","type":["null","int"],"default":null},{"name":"created_at","type":["null","string"],"default":null},{"name":"updated_at","type":["null","string"],"default":null},{"name":"timestamp","type":["null","long"],"default":null},{"name":"eventType","type":["null","string"],"default":null},{"name":"par","type":["null","string"],"default":null}]}
> partitioningFields: []
> >
> >
> >
> >
> > desc formatted dev.statistics_dashboard_visitor_hudi_rt
> >
> >
> > Hive 2.1.1:
> >
> >
> > col_name, data_type, comment # col_name , data_type , comment , ,
> > _hoodie_commit_time, string, _hoodie_commit_seqno, string,
> > _hoodie_record_key, string, _hoodie_partition_path, string,
> > _hoodie_file_name, string, id, int, user_id, int, dashboard_id, int,
> > created_at, string, updated_at, string, timestamp, bigint, eventtype,
> > string, , , # Partition Information, , # col_name , data_type , comment
> , ,
> > par, string, , , # Detailed Table Information, , Database: , dev ,
> Owner: ,
> > app , CreateTime: , Mon Feb 25 12:07:34 CST 2019, LastAccessTime: ,
> UNKNOWN
> > , Retention: , 0 , Location: ,
> >
> hdfs://yz-cluster-qa/user/hive/warehouse/dev.db/statistics_dashboard_visitor_hudi,
> > Table Type: , EXTERNAL_TABLE , Table Parameters:, , , EXTERNAL , TRUE ,
> > spark.sql.sources.schema.numPartCols, 1 ,
> > spark.sql.sources.schema.numParts, 1 , spark.sql.sources.schema.part.0,
> >
> {\"type\":\"struct\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"_hoodie_commit_seqno\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"_hoodie_record_key\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"_hoodie_partition_path\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"_hoodie_file_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"id\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"int\"}},{\"name\":\"user_id\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"int\"}},{\"name\":\"dashboard_id\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"int\"}},{\"name\":\"created_at\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"updated_at\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"timestamp\",\"type\":\"long\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"bigint\"}},{\"name\":\"eventType\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"comment\":\"\",\"HIVE_TYPE_STRING\":\"string\"}},{\"name\":\"par\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"HIVE_TYPE_STRING\":\"string\"}}]}
> > , spark.sql.sources.schema.partCol.0, par , transient_lastDdlTime,
> > 1551074880 , , # Storage Information, , SerDe Library: ,
> > com.uber.hoodie.hadoop.realtime.HoodieParquetSerde, InputFormat: ,
> > com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat, OutputFormat:
> ,
> > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,
> Compressed:
> > , No , Num Buckets: , -1 , Bucket Columns: , [] , Sort Columns: , [] ,
> > Storage Desc Params:, , , serialization.format, 1
> >
> >
> >
> > Hive 1.1.0-cdh5.15.1
> >
> >
> >
> -------------------------------+----------------------------------------------------+-----------------------+--+
> >
> > | col_name | data_type
> > | comment |
> >
> >
> >
> +-------------------------------+----------------------------------------------------+-----------------------+--+
> >
> > | # col_name | data_type
> > | comment |
> >
> > | | NULL
> > | NULL |
> >
> > | _hoodie_commit_time | string
> > | |
> >
> > | _hoodie_commit_seqno | string
> > | |
> >
> > | _hoodie_record_key | string
> > | |
> >
> > | _hoodie_partition_path | string
> > | |
> >
> > | _hoodie_file_name | string
> > | |
> >
> > | id | int
> > | |
> >
> > | user_id | int
> > | |
> >
> > | dashboard_id | int
> > | |
> >
> > | created_at | string
> > | |
> >
> > | updated_at | string
> > | |
> >
> > | timestamp | bigint
> > | |
> >
> > | eventtype | string
> > | |
> >
> > | | NULL
> > | NULL |
> >
> > | # Partition Information | NULL
> > | NULL |
> >
> > | # col_name | data_type
> > | comment |
> >
> > | | NULL
> > | NULL |
> >
> > | par | string
> > | |
> >
> > | | NULL
> > | NULL |
> >
> > | # Detailed Table Information | NULL
> > | NULL |
> >
> > | Database: | dev
> > | NULL |
> >
> > | Owner: | hive
> > | NULL |
> >
> > | CreateTime: | Tue Feb 26 00:05:25 CST 2019
> > | NULL |
> >
> > | LastAccessTime: | UNKNOWN
> > | NULL |
> >
> > | Protect Mode: | None
> > | NULL |
> >
> > | Retention: | 0
> > | NULL |
> >
> > | Location: |
> >
> hdfs://qabb-perf-alluxio-hadoop0:8020/user/hive/warehouse/dev.db/statistics_dashboard_visitor_hudi
> > | NULL |
> >
> > | Table Type: | EXTERNAL_TABLE
> > | NULL |
> >
> > | Table Parameters: | NULL
> > | NULL |
> >
> > | | EXTERNAL
> > | TRUE |
> >
> > | | numPartitions
> > | 1 |
> >
> > | | transient_lastDdlTime
> > | 1551110725 |
> >
> > | | NULL
> > | NULL |
> >
> > | # Storage Information | NULL
> > | NULL |
> >
> > | SerDe Library: |
> > com.uber.hoodie.hadoop.realtime.HoodieParquetSerde | NULL
> |
> >
> > | InputFormat: |
> > com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat | NULL
> > |
> >
> > | OutputFormat: |
> > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL
> > |
> >
> > | Compressed: | No
> > | NULL |
> >
> > | Num Buckets: | -1
> > | NULL |
> >
> > | Bucket Columns: | []
> > | NULL |
> >
> > | Sort Columns: | []
> > | NULL |
> >
> > | Storage Desc Params: | NULL
> > | NULL |
> >
> > | | serialization.format
> > | 1 |
> >
> >
> >
> +-------------------------------+----------------------------------------------------+-----------------------+--+
> >
> > Thanks,
> > Frank
> >
> > [email protected] <[email protected]> 于2019年2月27日周三 上午6:15写道:
> >
> >> Hi Frank,
> >> As Vinoth mentioned, can you share your environment (especially
> >> Hive/Spark version). Also, Can you paste the table definition as seen in
> >> Hive metastore ( desc formatted <table_name> )
> >>
> >> Balaji.V
> >> On Tuesday, February 26, 2019, 11:10:16 AM PST, Vinoth Chandar <
> >> [email protected]> wrote:
> >>
> >> Hi,
> >>
> >> Can you share more details about your environment and the full stack
> >> trace?
> >>
> >> Thanks
> >> Vinoth
> >>
> >> On Mon, Feb 25, 2019 at 11:10 PM kaka chen <[email protected]>
> wrote:
> >>
> >> > Hi All,
> >> >
> >> > AbstractRealtimeRecordReader cannot get the partition field from the
> >> > hive partition table by
> >> > String partitionFields = jobConf.get("partition_columns", "");
> >> >
> >> > Thanks,
> >> > Frank
> >> >
> >>
> >
> >