[
https://issues.apache.org/jira/browse/HUDI-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-3463:
-----------------------------
Component/s: writer-core
> Make user-defined BulkInsertPartitioner fit write path API
> ----------------------------------------------------------
>
> Key: HUDI-3463
> URL: https://issues.apache.org/jira/browse/HUDI-3463
> Project: Apache Hudi
> Issue Type: Improvement
> Components: writer-core
> Reporter: Raymond Xu
> Priority: Critical
>
> this existing logic is problematic due to we can’t enforce user’s partitioner
> to return JavaRDD, this potentially breaks.
> {code:java}
> BulkInsertPartitioner partitioner =
> userDefinedBulkInsertPartitioner.isPresent()
> ? userDefinedBulkInsertPartitioner.get()
> :
> BulkInsertInternalPartitionerFactory.get(config.getBulkInsertSortMode());
> repartitionedRecords = (JavaRDD<HoodieRecord<T>>)
> partitioner.repartitionRecords(dedupedRecords, parallelism);
> {code}
> The factory is used only in spark for now. So, we expect JavaRDD or
> HoodieData. The API can be made explicit about the constraint.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)