Danny Chen created HUDI-1557:
--------------------------------
Summary: Make Flink write pipeline write task scalable
Key: HUDI-1557
URL: https://issues.apache.org/jira/browse/HUDI-1557
Project: Apache Hudi
Issue Type: Sub-task
Components: DeltaStreamer
Reporter: Danny Chen
This issue introduces a BucketAssigner that assigns bucket ID (partition path &
fileID) for each stream record.
There is no need to look up index and partition the records anymore in the
following pipeline for these records,
we actually decide the write target location before the write and each record
computes its location when the BucketAssigner receives it, thus, the indexing
is with streaming style.
Computing locations for a batch of records all at a time is resource consuming
so a pressure to the engine,
we should avoid that in streaming system.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)