Alexander Trushev created HUDI-5516:
---------------------------------------
Summary: Reduce memory footprint on workload with thousand active
partitions
Key: HUDI-5516
URL: https://issues.apache.org/jira/browse/HUDI-5516
Project: Apache Hudi
Issue Type: Improvement
Components: flink
Reporter: Alexander Trushev
Assignee: Alexander Trushev
We can reduce memory footprint on workload with thousand active partitions
between checkpoints. That workload is relevant with wide checkpoint interval.
More specifically, active partition here is a special case of active fileId.
Write client holds map with write handles to create ReplaceHandle between
checkpoints. It leads to OutOfMemoryError on the workload because write handle
is huge object.
{code:sql}
create table source (
`id` int,
`data` string
) with (
'connector' = 'datagen',
'rows-per-second' = '100',
'fields.id.kind' = 'sequence',
'fields.id.start' = '0',
'fields.id.end' = '3000'
);
create table sink (
`id` int primary key,
`data` string,
`part` string
) partitioned by (`part`) with (
'connector' = 'hudi',
'path' = '/tmp/sink',
'write.batch.size' = '0.001', -- 1024 bytes
'write.task.max.size' = '101.001', -- 101.001MB
'write.merge.max_memory' = '1' -- 1024 bytes
);
insert into sink select `id`, `data`, concat('part', cast(`id` as string)) as
`part` from source;
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)