[GitHub] [hudi] yihua commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

GitBox Tue, 03 May 2022 12:27:35 -0700


yihua commented on issue #5485:
URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116479729


   > what you are saying is that independent of the datatype / style of the 
partitions from source dataset they won't be considered as fields, since Hudi 
Delta Streamer just list all the parquet files from the base path and read them 
directly, is that right to assume?
   
   That's correct.  If you want Spark like read to include the partition field 
from the partition path, you may consider SqlSource or SQL transformer.
   
   > By this you mean that hudi 0.9.0-amzn-1 doesn't support date typed 
partition field as partition on target, right? But the funny thing is: if I 
create the same sample data without partitioning I can write the data as hudi 
table without do the conversion from date to string, like the following code 
snippet
   
   In the snippet you provided, `ComplexKeyGenerator` is used which goes 
through a different code path.  `CustomKeyGenerator` leverages 
`TimestampBasedAvroKeyGenerator` for timestamp field which does not support 
date typed partition field in 0.9.0.  That is what I meant.  Maybe I was not 
clear in my previous statement.
   
   > There is a reason why hudi delta streamer choose doesn't behavior like 
spark (considering the partitions as part of the schema)?
   
   `ParquetDFSSource` treats the source as plain parquet files.  As you said, 
for your use case, you can use SQL transformer to wire in the Spark like logic 
for the partition field with the Deltastreamer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

Reply via email to