Alexey Kudinkin created HUDI-5321:
-------------------------------------

             Summary: Fix Bulk Insert ColumnSortPartitioners
                 Key: HUDI-5321
                 URL: https://issues.apache.org/jira/browse/HUDI-5321
             Project: Apache Hudi
          Issue Type: Bug
    Affects Versions: 0.12.1
            Reporter: Alexey Kudinkin
            Assignee: sivabalan narayanan
             Fix For: 0.12.2


Currently, all of the Custom Bulk Insert ColumnSortPartitioner impls 
incorrectly return "true" from the "arePartitionRecordsSorted" method, even 
though records might not necessarily be sorted by the partition-path columns as 
is required by this method.

In case when such Partitioner is used and the data is NOT sorted by the list of 
columns that start w/ partition ones, this could lead to a Parquet writers 
being closed prematurely when writing files creating a LOT of small files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to