Zoltán Borók-Nagy created IMPALA-13598:
------------------------------------------

             Summary: OPTIMIZE redundantly accumulates memory in HDFS WRITER
                 Key: IMPALA-13598
                 URL: https://issues.apache.org/jira/browse/IMPALA-13598
             Project: IMPALA
          Issue Type: Bug
            Reporter: Zoltán Borók-Nagy


When we have an Iceberg table that have lots of partitions, and we want to 
compact the table via OPTIMIZE, it will use much more memory than needed.

Repro steps:
{noformat}
create table tmp_ice_tpch
partitioned by spec(truncate(500, l_orderkey))
stored by iceberg as
select * from tpch.lineitem;

OPTIMIZE TABLE tmp_ice_tpch;

# We likely get a Memory Limit Exceeded error here{noformat}
Currently OPTIMIZE uses INSERT OVERWRITE under the hood:
{noformat}
INSERT OVERWRITE tmp_ice_tpch SELECT * FROM tmp_ice_tpch;{noformat}
But INSERT OVERWRITE doesn't accumulate the memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to