[GitHub] [hudi] taisenki opened a new issue #4477: [SUPPORT]using spark on TimestampBasedKeyGenerator has no result when query by partition column

GitBox Thu, 30 Dec 2021 00:26:28 -0800


taisenki opened a new issue #4477:
URL: https://github.com/apache/hudi/issues/4477



   hudi table options:
   
   hoodie.datasource.write.table.type=MERGE_ON_READ
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=data_date
   hoodie.datasource.write.precombine.field=ts
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
   hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
   hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
   hoodie.deltastreamer.keygen.timebased.timezone="GMT+8:00"
   hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd
   hoodie.datasource.hive_sync.database=hudi
   hoodie.datasource.hive_sync.enable=true
   
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
   
   when i query table with data_date, there're no result.
   like this:
   
   ```shell
   scala> spark.read.format("hudi").load(tablePath).where("id = 
'225709893'").select("data_date").where("data_date = '2018-09-23'").show
   +---------+
   |data_date|
   +---------+
   +---------+
   
   
   scala> spark.read.format("hudi").load(tablePath).where("id = 
'225709893'").select("data_date").show
   +----------+
   | data_date|
   +----------+
   |2018-09-23|
   |2018-09-22|
   |2018-09-19|
   |2018-09-21|
   |2018-09-25|
   |2018-09-20|
   |2018-09-24|
   +----------+
   ```
   
   I follow the code, and find the partition use string '2018/09/23' and query 
patten is '2018-09-23' in HoodieFileIndex#prunePartition.
   
   Is there any way to introduce the 
hoodie.deltastreamer.keygen.timebased.output.dateformat configuration for query 
conversion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] taisenki opened a new issue #4477: [SUPPORT]using spark on TimestampBasedKeyGenerator has no result when query by partition column

Reply via email to