Jonathan Vexler created HUDI-8031:
-------------------------------------
Summary: Allow read partition path field directly from file in new
filegroup reader
Key: HUDI-8031
URL: https://issues.apache.org/jira/browse/HUDI-8031
Project: Apache Hudi
Issue Type: Improvement
Components: reader-core, spark
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler
Currently for spark, we append the same partition path value to the end of
every record. If you use timestamp based keygen for example, your partition
field can differ for every record.
Idea for how to implement: In default source / hadoopfs factory, we figure out
if the partition cols are going to all be the same values or not. If they are
not, we set the partition schema as empty. 1 thing to think about is that in
the file index, we need to look through the data filters and move them to
partition filters if they are on the partition column.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)