leobiscassi commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116656782
> That's correct. If you want Spark like read to include the partition field from the partition path, you may consider SqlSource or SQL transformer. When I use the `ParquetDFSSource` with the SQL transformer I have exactly the same issue w/ the partitions, so the only way I have to extract this is with `INPUT_FILE_NAME()`+ custom logic. > In the snippet you provided, ComplexKeyGenerator is used which goes through a different code path. CustomKeyGenerator leverages TimestampBasedAvroKeyGenerator for timestamp field which does not support date typed partition field in 0.9.0. That is what I meant. Maybe I was not clear in my previous statement. Got it, thanks for the explanation. > ParquetDFSSource treats the source as plain parquet files. As you said, for your use case, you can use SQL transformer to wire in the Spark like logic for the partition field with the Deltastreamer. It would be great add this information about the behavior w/ partitions on the docs, I spent a lot of time trying to see if I was missing some configuration that leads to this behavior. I didn't found anything there, there is something like this on the docs? If there isn't I could submit a PR adding this explanation w/ examples if you think it makes sense. Thank you for the clarification @yihua! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
