Vinish Reddy created HUDI-7617:
----------------------------------
Summary: Fix issues for bulk insert user defined partitioner in
StreamSync
Key: HUDI-7617
URL: https://issues.apache.org/jira/browse/HUDI-7617
Project: Apache Hudi
Issue Type: Bug
Components: deltastreamer
Reporter: Vinish Reddy
Assignee: Vinish Reddy
There are two problems with BULK_INSERT and partitioners.
# Passing user defined partitioner using
{{hoodie.bulkinsert.user.defined.partitioner.class}} is not honoured in the
StreamSync code path and the data is written in a non-sort mode and can lead to
OOM errors because of too many open writeHandles.
# There is another problem with {{RDDCustomColumnsSortPartitioner}} and
{{RowCustomColumnsSortPartitioner}} where data is globally sorted but too many
files are written because data is actually not pre-pending the partition keys
in the sort columns. The unit test fails with this error for existing code.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)