[ 
https://issues.apache.org/jira/browse/IMPALA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846017#comment-17846017
 ] 

Riza Suminto commented on IMPALA-13075:
---------------------------------------

Yes, BATCH_SIZE number is a basic unit of how Impala estimate / allocate memory.
[https://cwiki.apache.org/confluence/display/IMPALA/Impala+Row+Batches] 

Both Frontend Planner and Backend Executor respect this BATCH_SIZE number. If 
MEM_LIMIT still above minimum memory resource requirement, I would expect that 
query can still get admitted and run even though it is not performant (ie., it 
need to spill rows to disk). Each fragment claim their minimum memory 
requirement right after they're instantiated.

Please attach the full query profile of both good and bad run so we can analyze 
it more.

> Setting very high BATCH_SIZE can blow up memory usage of fragments
> ------------------------------------------------------------------
>
>                 Key: IMPALA-13075
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13075
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: Ezra Zerihun
>            Priority: Major
>
> In Impala 4.0, setting a very high BATCH_SIZE or near max limit of 65536 can 
> cause some fragment's memory usage to spike way past the query's defined 
> MEM_LIMIT or pool's Maximum Query Memory Limit with Clamp on. So even though 
> MEM_LIMIT is set reasonable, the query can still fail with out of memory and 
> a huge amount of memory used on fragment. Reducing BATCH_SIZE to a reasonable 
> amount or back to default will allow the query to run without issue and use 
> reasonable amount of memory within query's MEM_LIMIT or pool's Maximum Query 
> Memory Limit.
>  
> 1) set BATCH_SIZE=65536; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: EXCEPTION
>     Impala Query State: ERROR
>     Query Status: Memory limit exceeded: Error occurred on backend ...:27000 
> by fragment ... Memory left in process limit: 145.53 GB Memory left in query 
> limit: -6.80 GB Query(...): memory limit exceeded. Limit=1.00 GB 
> Reservation=86.44 MB ReservationLimit=819.20 MB OtherMemory=7.71 GB 
> Total=7.80 GB Peak=7.84 GB   Unclaimed reservations: Reservation=8.50 MB 
> OtherMemory=0 Total=8.50 MB Peak=56.44 MB   Runtime Filter Bank: 
> Reservation=4.00 MB ReservationLimit=4.00 MB OtherMemory=0 Total=4.00 MB 
> Peak=4.00 MB   Fragment ...: Reservation=1.94 MB OtherMemory=7.59 GB 
> Total=7.59 GB Peak=7.63 GB     HASH_JOIN_NODE (id=8): Reservation=1.94 MB 
> OtherMemory=7.57 GB Total=7.57 GB Peak=7.57 GB       Exprs: Total=7.57 GB 
> Peak=7.57 GB       Hash Join Builder (join_node_id=8): Total=0 Peak=1.95 MB
> ...
>     Query Options (set by configuration): 
> BATCH_SIZE=65536,MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell 
> v4.0.0.7.2.16.0-287 (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>    ExecSummary:
> ...
> 09:AGGREGATE                    32     32    0.000ns    0.000ns        0      
>  4.83M   36.31 MB      212.78 MB  STREAMING                                 
> 08:HASH JOIN                    32     32    5s149ms      2m44s        0     
> 194.95M    7.57 GB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32   93.750us    1.000ms   10.46K      
>  1.55K    1.65 MB        2.56 MB  HASH(...
> {code}
>  
>  
> 2) set BATCH_SIZE=0; set MEM_LIMIT=1g;
>  
> {code:java}
>     Query State: FINISHED
>     Impala Query State: FINISHED
> ...
>     Query Options (set by configuration and planner): 
> MEM_LIMIT=1073741824,CLIENT_IDENTIFIER=Impala Shell v4.0.0.7.2.16.0-287 
> (5ae3917) built on Mon Jan  9 21:23:59 UTC 
> 2023,DEFAULT_FILE_FORMAT=PARQUET,...
> ...
>     ExecSummary:
> ...
> 09:AGGREGATE                    32     32  593.748us   18.999ms       45      
>  4.83M    34.06 MB      212.78 MB  STREAMING
> 08:HASH JOIN                    32     32   10s873ms      5m47s   10.47K     
> 194.95M   123.48 MB        1.94 MB  RIGHT OUTER JOIN, PARTITIONED
> |--18:EXCHANGE                  32     32    0.000ns    0.000ns   10.46K      
>  1.55K   344.00 KB        1.69 MB  HASH(...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to