Prashant Wason created HUDI-4094:
------------------------------------
Summary: Allow bulk insert partitioner to specify the fileID
prefixes to use
Key: HUDI-4094
URL: https://issues.apache.org/jira/browse/HUDI-4094
Project: Apache Hudi
Issue Type: New Feature
Reporter: Prashant Wason
Assignee: Prashant Wason
This is useful for using bulk insert when bootstrapping metadata table indexes.
Currently we use upsertPrepped to write to metadata table. The upsert code path
is not optimized for very large writes (1Billion+ records) due to the work load
profiling and upsert partitioning overheads.
Bulk insert for metadata table requires the partitions to be written to files
which have special names and hence random fileIDs cannot be used (as currently
implemented).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)