[ 
https://issues.apache.org/jira/browse/HUDI-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7617.
-----------------------------
    Resolution: Fixed

> Fix issues for bulk insert user defined partitioner in StreamSync
> -----------------------------------------------------------------
>
>                 Key: HUDI-7617
>                 URL: https://issues.apache.org/jira/browse/HUDI-7617
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: deltastreamer
>            Reporter: Vinish Reddy
>            Assignee: Vinish Reddy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.15.0, 1.0.0
>
>
> There are two problems with BULK_INSERT and partitioners.
>  # Passing user defined partitioner using 
> {{hoodie.bulkinsert.user.defined.partitioner.class}} is not honoured in the 
> StreamSync code path and the data is written in a non-sort mode and can lead 
> to OOM errors because of too many open writeHandles.
>  # There is another problem with {{RDDCustomColumnsSortPartitioner}} and 
> {{RowCustomColumnsSortPartitioner}} where data is globally sorted but too 
> many files are written because data is actually not pre-pending the partition 
> keys in the sort columns. The unit test fails with this error for existing 
> code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to