[
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846017#comment-17846017
]
Riza Suminto commented on IMPALA-13075:
---------------------------------------
Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory.
[https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches]
Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If
MEM_LIMIT still above minimum memory resource requirement, I would expect that
query can still get admitted and run even though it is not performant (ie., it
need to spill rows to disk). Each fragment claim their minimum memory
requirement right after they're instantiated.
Please attach the full query profile of both good and bad run so we can analyze
it more.
> Setting very high BATCH_SIZE can blow up memory usage of fragments
> ------------------------------------------------------------------
>
> Key: IMPALA-13075
> URL: https://issues.apache.org/jira/browse/IMPALA-13075
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 4.0.0
> Reporter: Ezra Zerihun
> Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can
> cause some fragment's memory usage to spike way past the query's defined
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable
> amount or back to default will allow the query to run without issue and use
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query
> Memory Limit.
>
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>
> {code:java}
> Query State: EXCEPTION
> Impala Query State: ERROR
> Query Status: Memory limit exceeded: Error occurred on backend ...:27000
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB
> Total=7.80 GB Peak=7.84 GB Unclaimed reservations: Reservation=8.50 MB
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB Runtime Filter Bank:
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB
> Peak=4.00 MB Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB
> Total=7.59 GB Peak=7.63 GB HASH_JOIN_NODE (id=8): Reservation=1.94 MB
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB Exprs: Total=7.57 GB
> Peak=7.57 GB Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
> Query Options (set by configuration):
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan 9 21:23:59 UTC
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
> ExecSummary:
> ...
> 09:AGGREGATE 32 32 0.000ns 0.000ns 0
> 4.83M 36.31 MB 212.78 MB STREAMING
> 08:HASH JOIN 32 32 5s149ms 2m44s 0
> 194.95M 7.57 GB 1.94 MB RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE 32 32 93.750us 1.000ms 10.46K
> 1.55K 1.65 MB 2.56 MB HASH(...
> {code}
>
>
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>
> {code:java}
> Query State: FINISHED
> Impala Query State: FINISHED
> ...
> Query Options (set by configuration and planner):
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287
> (5ae3917) built on Mon Jan 9 21:23:59 UTC
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
> ExecSummary:
> ...
> 09:AGGREGATE 32 32 593.748us 18.999ms 45
> 4.83M 34.06 MB 212.78 MB STREAMING
> 08:HASH JOIN 32 32 10s873ms 5m47s 10.47K
> 194.95M 123.48 MB 1.94 MB RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE 32 32 0.000ns 0.000ns 10.46K
> 1.55K 344.00 KB 1.69 MB HASH(...
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]