Danny Chen created HUDI-1557:
--------------------------------

             Summary: Make Flink write pipeline write task scalable
                 Key: HUDI-1557
                 URL: https://issues.apache.org/jira/browse/HUDI-1557
             Project: Apache Hudi
          Issue Type: Sub-task
          Components: DeltaStreamer
            Reporter: Danny Chen


This issue introduces a BucketAssigner that assigns bucket ID (partition path & 
fileID) for each stream record.

There is no need to look up index and partition the records anymore in the 
following pipeline for these records,
we actually decide the write target location before the write and each record 
computes its location when the BucketAssigner receives it, thus, the indexing 
is with streaming style.

Computing locations for a batch of records all at a time is resource consuming 
so a pressure to the engine,
we should avoid that in streaming system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to