Alexey Kudinkin created HUDI-5321:
-------------------------------------
Summary: Fix Bulk Insert ColumnSortPartitioners
Key: HUDI-5321
URL: https://issues.apache.org/jira/browse/HUDI-5321
Project: Apache Hudi
Issue Type: Bug
Affects Versions: 0.12.1
Reporter: Alexey Kudinkin
Assignee: sivabalan narayanan
Fix For: 0.12.2
Currently, all of the Custom Bulk Insert ColumnSortPartitioner impls
incorrectly return "true" from the "arePartitionRecordsSorted" method, even
though records might not necessarily be sorted by the partition-path columns as
is required by this method.
In case when such Partitioner is used and the data is NOT sorted by the list of
columns that start w/ partition ones, this could lead to a Parquet writers
being closed prematurely when writing files creating a LOT of small files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)