[jira] [Commented] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

Lin Liu (Jira) Tue, 12 Sep 2023 15:24:05 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764420#comment-17764420
 ]


Lin Liu commented on HUDI-5807:
-------------------------------

So far I have found the cause of the bug:
 # During the write no partition_path is added to the base_file and log files.
 # During the read, the base file reader can append the partition path into 
record; but the log file reader does not have the mechanism to add the 
partition_path field into the payload.
 # During merging, the record from log file is output, whose partition_path 
field is NULL.

 

I have checked the solution to add the partition path to the underlying payload 
InternalRow; but I haven't found an efficient solution. We should discuss the 
possible solutions.

 

> HoodieSparkParquetReader is not appending partition-path values
> ---------------------------------------------------------------
>
>                 Key: HUDI-5807
>                 URL: https://issues.apache.org/jira/browse/HUDI-5807
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.13.0
>            Reporter: Alexey Kudinkin
>            Assignee: Lin Liu
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> Current implementation of HoodieSparkParquetReader isn't supporting the case 
> when "hoodie.datasource.write.drop.partition.columns" is set to true.
> In that case partition-path values are expected to be parsed from 
> partition-path and be injected w/in the File Reader (this is behavior of 
> Spark's own readers)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

Reply via email to