[
https://issues.apache.org/jira/browse/IMPALA-13656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913418#comment-17913418
]
ASF subversion and git services commented on IMPALA-13656:
----------------------------------------------------------
Commit 55d7498b2478f5988d53c2ec0bd1b282a8298fe1 in impala's branch
refs/heads/master from Noemi Pap-Takacs
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=55d7498b2 ]
IMPALA-13656: MERGE redundantly accumulates memory in HDFS WRITER
When IcebergMergeImpl created the table sink it didn't set
'inputIsClustered' to true. Therefore HdfsTableSink expected
random input and kept the output writers open for every partition,
which resulted in high memory consumption and potentially a
Memory Limit Exceeded error when the number of partitions are high.
Since we actually sort the rows before the sink we can set
'inputIsClustered' to true, which means HdfsTableSink can write
files one by one, because whenever it gets a row that belongs
to a new partition it knows that it can close the current output
writer, and open a new one.
Testing:
- e2e regression test
Change-Id: I7bad0310e96eb482af9d09ba0d41e44c07bf8e4d
Reviewed-on: http://gerrit.cloudera.org:8080/22332
Reviewed-by: Peter Rozsa <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> MERGE redundantly accumulates memory in HDFS WRITER
> ---------------------------------------------------
>
> Key: IMPALA-13656
> URL: https://issues.apache.org/jira/browse/IMPALA-13656
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Noemi Pap-Takacs
> Assignee: Noemi Pap-Takacs
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> When we want to merge Iceberg tables that have lots of partitions, the
> execution will use much more memory than needed, possibly resulting in a
> Memory Limit Exceeded error.
> It happens because when IcebergMergeImpl creates the insert table sink it
> doesn't set 'inputIsClustered' to true. Therefore HdfsTableSink expects
> random input and keeps the output writers open for every partition, which
> results in high memory consumption.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]