[jira] [Commented] (IMPALA-11068) Query hit OOM under high decompression activity

Riza Suminto (Jira) Fri, 06 Oct 2023 09:03:59 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772625#comment-17772625
 ]


Riza Suminto commented on IMPALA-11068:
---------------------------------------

IIRC, once a scanner thread is launched, it is committed to read that scan 
range. There is no way to retry or put the failed scan range back to queue.

> Query hit OOM under high decompression activity
> -----------------------------------------------
>
>                 Key: IMPALA-11068
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11068
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>
> A customer report query hitting OOM over wide table and heavy decompression 
> activity. The impala cluster was running with scanner thread parallelism 
> (MT_DOP=0).
> The following is the error message shown:
> {code:java}
> Errors: Memory limit exceeded: ParquetColumnChunkReader::InitDictionary() 
> failed to allocate 969825 bytes for dictionary.
> HDFS_SCAN_NODE (id=0) could not allocate 947.09 KB without exceeding limit.
> Error occurred on backend [redacted]:22000 by fragment 
> d346730dc3a3771e:c24e3ccf00000008
> Memory left in process limit: 233.77 GB
> Memory left in query limit: 503.51 KB
> Query(d346730dc3a3771e:c24e3ccf00000000): Limit=4.13 GB Reservation=3.30 GB 
> ReservationLimit=3.30 GB OtherMemory=849.17 MB Total=4.13 GB Peak=4.13 GB
> Fragment d346730dc3a3771e:c24e3ccf00000008: Reservation=3.30 GB 
> OtherMemory=849.59 MB Total=4.13 GB Peak=4.13 GB{code}
>  
> I look at the corresponding profile of the fragment and notice some key 
> counters as follow:
> {code:java}
>       Instance d346730dc3a3771e:c24e3ccf00000008 (host=[redacted]:22000)
>       ...
>           HDFS_SCAN_NODE (id=0)
>           ...
>             - AverageHdfsReadThreadConcurrency: 8.00 (8.0)
>             - AverageScannerThreadConcurrency: 23.00 (23.0)
>             - BytesRead: 2.4 GiB (2619685502)
>             ...
>             - NumScannerThreadMemUnavailable: 1 (1)
>             - NumScannerThreadReservationsDenied: 0 (0)
>             - NumScannerThreadsStarted: 23 (23)
>             - NumScannersWithNoReads: 12 (12)
>             - NumStatsFilteredPages: 4,032 (4032)
>             - NumStatsFilteredRowGroups: 1 (1)
>             - PeakMemoryUsage: 4.1 GiB (4431745197)
>             - PeakScannerThreadConcurrency: 23 (23)
>             - PerReadThreadRawHdfsThroughput: 842.1 MiB/s (882954163)
>             - RemoteScanRanges: 11 (11)
>             - RowBatchBytesEnqueued: 1.1 GiB (1221333486)
>             - RowBatchQueueGetWaitTime: 1.83s (1833499080)
>             - RowBatchQueuePeakMemoryUsage: 599.3 MiB (628430704)
>             - RowBatchQueuePutWaitTime: 1ms (1579356)
>             - RowBatchesEnqueued: 124 (124)
>             - RowsRead: 2,725,888 (2725888)
>             - RowsReturned: 0 (0){code}
>  
> Based on these counters, I assume following scenario happened:
>  # The concurrent scanner thread count peak at 23 (NumScannerThreadsStarted, 
> PeakScannerThreadConcurrency).
>  # Scanner node seems to try schedule the 24th thread, but backend denies it, 
> as indicated by NumScannerThreadMemUnavailable=1. 
>  # The running threads has been producing output row batches 
> (RowBatchesEnqueued=124), but the next exec node above it has not fetch any 
> yet (RowsReturned=0). So active scanner threads has been consuming its memory 
> reservation, including decompression activity that is happening in 
> [parquet-column-chunk-reader.cc|https://github.com/apache/impala/blob/df42225/be/src/exec/parquet/parquet-column-chunk-reader.cc#L155-L177].
>  # Just before the scanner node failed, it has consume Reservation=3.30 GB  
> and OtherMemory=849.59 MB. So per thread is around Reservation=146.92 MB and 
> OtherMemory=36.94 MB. This is close, but slightly higher, to planner initial 
> mem-reservation=128.00 MB for scanner node and 32 MB of 
> [hdfs_scanner_thread_max_estimated_bytes|https://github.com/apache/impala/blob/df42225/be/src/exec/hdfs-scan-node.cc#L57-L63]
>  for decompression usage per thread.
> Note that the 32 MB of hdfs_scanner_thread_max_estimated_bytes is a 
> non-reserved bytes. Meaning, they only allocated as needed during column 
> chunk decompression, but we think that in most cases they wont require more 
> than 32 MB.
> From these insight, I'm suspecting that when scanner node schedule the 23rd 
> thread, the memory reservation left was just barely fit the per-thread 
> consumption estimate (128.00 MB + 32 MB), and the backend allow it to start. 
> As the decompression process goes, one of the scanner thread tried to 
> allocate more memory than what is left in reservation at 
> ParquetColumnChunkReader::InitDictionary(). If the 23rd thread was not 
> launched, we might have enough memory to serve decompression requirement.
> One solution to avoid this OOM is to change our per-thread memory estimation 
> in 
> [scanner-mem-limiter.cc|https://github.com/apache/impala/blob/df42225/be/src/runtime/scanner-mem-limiter.cc#L59].
>  Maybe we should deny reservation once memory spare capacity can not fit 2 
> threads allocation consecutively (ie., always leave headroom of 1 thread 
> allocation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11068) Query hit OOM under high decompression activity

Reply via email to