Vinish Reddy created HUDI-7617:
----------------------------------

             Summary: Fix issues for bulk insert user defined partitioner in 
StreamSync
                 Key: HUDI-7617
                 URL: https://issues.apache.org/jira/browse/HUDI-7617
             Project: Apache Hudi
          Issue Type: Bug
          Components: deltastreamer
            Reporter: Vinish Reddy
            Assignee: Vinish Reddy


There are two problems with BULK_INSERT and partitioners.
 # Passing user defined partitioner using 
{{hoodie.bulkinsert.user.defined.partitioner.class}} is not honoured in the 
StreamSync code path and the data is written in a non-sort mode and can lead to 
OOM errors because of too many open writeHandles.
 # There is another problem with {{RDDCustomColumnsSortPartitioner}} and 
{{RowCustomColumnsSortPartitioner}} where data is globally sorted but too many 
files are written because data is actually not pre-pending the partition keys 
in the sort columns. The unit test fails with this error for existing code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to