Zoltán Borók-Nagy created IMPALA-13598:
------------------------------------------
Summary: OPTIMIZE redundantly accumulates memory in HDFS WRITER
Key: IMPALA-13598
URL: https://issues.apache.org/jira/browse/IMPALA-13598
Project: IMPALA
Issue Type: Bug
Reporter: Zoltán Borók-Nagy
When we have an Iceberg table that have lots of partitions, and we want to
compact the table via OPTIMIZE, it will use much more memory than needed.
Repro steps:
{noformat}
create table tmp_ice_tpch
partitioned by spec(truncate(500, l_orderkey))
stored by iceberg as
select * from tpch.lineitem;
OPTIMIZE TABLE tmp_ice_tpch;
# We likely get a Memory Limit Exceeded error here{noformat}
Currently OPTIMIZE uses INSERT OVERWRITE under the hood:
{noformat}
INSERT OVERWRITE tmp_ice_tpch SELECT * FROM tmp_ice_tpch;{noformat}
But INSERT OVERWRITE doesn't accumulate the memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)