[GitHub] [hudi] zuoyebushiwo opened a new issue #2377: Hive query hudi table, but selected all update history datas

GitBox Thu, 24 Dec 2020 23:11:45 -0800


zuoyebushiwo opened a new issue #2377:
URL: https://github.com/apache/hudi/issues/2377



   1、shema:
   {
     "type":"record",
     "name":"stock_ticks",
     "fields":[{
        "name": "uuid",
        "type": "string"
     }, {
        "name": "ts",
        "type": "long"
     }, {
        "name": "symbol",
        "type": "string"
     },{
        "name": "year",
        "type": "int"
     },{
        "name": "month",
        "type": "int"
     },{
        "name": "high",
        "type": "double"
     },{
        "name": "low",
        "type": "double"
     },{
        "name": "key",
        "type": "string"
     },{
        "name": "close",
        "type": "double"
     }, {
        "name": "open",
        "type": "double"
     }, {
        "name": "day",
        "type":"string"
     }
   ]}
   
   2、hudi-conf.properties：
   hoodie.datasource.write.recordkey.field=uuid
   hoodie.datasource.write.partitionpath.field=ts
   bootstrap.servers=hd-node-3-41.wakedata.com:6667
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator
   hoodie.embed.timeline.server=false
   
hoodie.deltastreamer.schemaprovider.source.schema.file=hdfs://HDFSCluster/hudi/test/config/flink/schema.avsc
   
hoodie.deltastreamer.schemaprovider.target.schema.file=hdfs://HDFSCluster/hudi/test/config/flink/schema.avsc
   
   3、HoodieFlinkStreamer Program arguments:
   --kafka-bootstrap-servers hd-node-3-41.wakedata.com:6667  --kafka-group-id 
hudi  --kafka-topic hudi_json_topic  --target-table hudi_on_flink_local_9  
--table-type COPY_ON_WRITE  --target-base-path 
hdfs://HDFSCluster/hudi/test/data/hudi_on_flink_local_9  --props 
hdfs://HDFSCluster/hudi/test/config/flink/hudi-conf.properties 
--checkpoint-interval 60000 --flink-checkpoint-path 
hdfs://HDFSCluster/hudi/hudi_on_flink_local_9
   
   4、Topic data:
   
   {"uuid":"2", "ts":1608538277000, "symbol":"symbol1", 
"year":2020,"month":12,"high":0.1, "low":0.01, 
"key":"key1","close":0.1,"open":0.2, "day":"21"}
   {"uuid":"2", "ts":1608538277000, "symbol":"symbol2", 
"year":2020,"month":12,"high":0.1, "low":0.01, 
"key":"key2","close":0.1,"open":0.2, "day":"21"}
   {"uuid":"2", "ts":1608538277000, "symbol":"symbol3", 
"year":2020,"month":12,"high":0.1, "low":0.01, 
"key":"key3","close":0.1,"open":0.2, "day":"21"}
   {"uuid":"2", "ts":1608538277000, "symbol":"symbol4", 
"year":2020,"month":12,"high":0.1, "low":0.01, 
"key":"key4","close":0.1,"open":0.2, "day":"21"}
   
   5、Hive create table:
   
   add jars 
hdfs://HDFSCluster/hudi/test/hudi-hadoop-mr-bundle-0.6.1-SNAPSHOT.jar;
   
   CREATE EXTERNAL TABLE hudi_on_flink_local_9(
   `_hoodie_commit_time` string,
   `_hoodie_commit_seqno` string,
   `_hoodie_record_key` string,
   `_hoodie_partition_path` string,
   `_hoodie_file_name` string,
   `uuid` string,
   `ts` bigint,
   `symbol` string,
   `year` int,
   `month` int,
   `high` double,
   `low` double,
   `key` string,
   `close` double,
   `open` double,
   `day` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hudi.hadoop.HoodieParquetInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
   'hdfs://HDFSCluster/hudi/test/data/hudi_on_flink_local_9';
   
   when i query hive table all upset history is appear! but spark is not.
   help me, thinks very much!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zuoyebushiwo opened a new issue #2377: Hive query hudi table, but selected all update history datas

Reply via email to