[
https://issues.apache.org/jira/browse/HUDI-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sagar Sumit closed HUDI-7617.
-----------------------------
Resolution: Fixed
> Fix issues for bulk insert user defined partitioner in StreamSync
> -----------------------------------------------------------------
>
> Key: HUDI-7617
> URL: https://issues.apache.org/jira/browse/HUDI-7617
> Project: Apache Hudi
> Issue Type: Bug
> Components: deltastreamer
> Reporter: Vinish Reddy
> Assignee: Vinish Reddy
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> There are two problems with BULK_INSERT and partitioners.
> # Passing user defined partitioner using
> {{hoodie.bulkinsert.user.defined.partitioner.class}} is not honoured in the
> StreamSync code path and the data is written in a non-sort mode and can lead
> to OOM errors because of too many open writeHandles.
> # There is another problem with {{RDDCustomColumnsSortPartitioner}} and
> {{RowCustomColumnsSortPartitioner}} where data is globally sorted but too
> many files are written because data is actually not pre-pending the partition
> keys in the sort columns. The unit test fails with this error for existing
> code.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)