Alexander Trushev created HUDI-5516:
---------------------------------------

             Summary: Reduce memory footprint on workload with thousand active 
partitions
                 Key: HUDI-5516
                 URL: https://issues.apache.org/jira/browse/HUDI-5516
             Project: Apache Hudi
          Issue Type: Improvement
          Components: flink
            Reporter: Alexander Trushev
            Assignee: Alexander Trushev


We can reduce memory footprint on workload with thousand active partitions 
between checkpoints. That workload is relevant with wide checkpoint interval. 
More specifically, active partition here is a special case of active fileId.
Write client holds map with write handles to create ReplaceHandle between 
checkpoints. It leads to OutOfMemoryError on the workload because write handle 
is huge object.

{code:sql}

create table source (
    `id` int,
    `data` string
) with (
    'connector' = 'datagen',
    'rows-per-second' = '100',
    'fields.id.kind' = 'sequence',
    'fields.id.start' = '0',
    'fields.id.end' = '3000'
);
create table sink (
    `id` int primary key,
    `data` string,
    `part` string
) partitioned by (`part`) with (
    'connector' = 'hudi',
    'path' = '/tmp/sink',
    'write.batch.size' = '0.001',  -- 1024 bytes
    'write.task.max.size' = '101.001',  -- 101.001MB
    'write.merge.max_memory' = '1'  -- 1024 bytes
);

insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as 
`part` from source;

{code} 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to