taisenki opened a new issue #4477:
URL: https://github.com/apache/hudi/issues/4477
hudi table options:
hoodie.datasource.write.table.type=MERGE_ON_READ
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=data_date
hoodie.datasource.write.precombine.field=ts
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
hoodie.deltastreamer.keygen.timebased.timezone="GMT+8:00"
hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd
hoodie.datasource.hive_sync.database=hudi
hoodie.datasource.hive_sync.enable=true
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
when i query table with data_date, there're no result.
like this:
```shell
scala> spark.read.format("hudi").load(tablePath).where("id =
'225709893'").select("data_date").where("data_date = '2018-09-23'").show
+---------+
|data_date|
+---------+
+---------+
scala> spark.read.format("hudi").load(tablePath).where("id =
'225709893'").select("data_date").show
+----------+
| data_date|
+----------+
|2018-09-23|
|2018-09-22|
|2018-09-19|
|2018-09-21|
|2018-09-25|
|2018-09-20|
|2018-09-24|
+----------+
```
I follow the code, and find the partition use string '2018/09/23' and query
patten is '2018-09-23' in HoodieFileIndex#prunePartition.
Is there any way to introduce the
hoodie.deltastreamer.keygen.timebased.output.dateformat configuration for query
conversion?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]