[GitHub] [hudi] leobiscassi commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

GitBox Tue, 03 May 2022 14:05:51 -0700


leobiscassi commented on issue #5485:
URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116656782


   > That's correct. If you want Spark like read to include the partition field 
from the partition path, you may consider SqlSource or SQL transformer.
   
   When I use the `ParquetDFSSource` with the SQL transformer I have exactly 
the same issue w/ the partitions, so the only way I have to extract this is 
with `INPUT_FILE_NAME()`+ custom logic.
   
   > In the snippet you provided, ComplexKeyGenerator is used which goes 
through a different code path. CustomKeyGenerator leverages 
TimestampBasedAvroKeyGenerator for timestamp field which does not support date 
typed partition field in 0.9.0. That is what I meant. Maybe I was not clear in 
my previous statement.
   
   Got it, thanks for the explanation.
   
   > ParquetDFSSource treats the source as plain parquet files. As you said, 
for your use case, you can use SQL transformer to wire in the Spark like logic 
for the partition field with the Deltastreamer.
   
   It would be great add this information about the behavior w/ partitions on 
the docs, I spent a lot of time trying to see if I was missing some 
configuration that leads to this behavior. I didn't found anything there, there 
is something like this on the docs? If there isn't I could submit a PR adding 
this explanation w/ examples if you think it makes sense.
   
   Thank you for the clarification @yihua!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] leobiscassi commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

Reply via email to