Quanlong Huang created IMPALA-12046:
---------------------------------------

             Summary: Add profile counter for scan range queueing time on disk 
queues
                 Key: IMPALA-12046
                 URL: https://issues.apache.org/jira/browse/IMPALA-12046
             Project: IMPALA
          Issue Type: New Feature
          Components: Backend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


I saw a profile showing the total time of a ScanNode is dominanted by 
{{{}ScannerIoWaitTime{}}}. However, the hdfs openFileTime and readTime are all 
small. No other counters can explain why {{ScannerIoWaitTime}} is long.
{code:java}
- DecompressionTime: 964.648ms
- InactiveTotalTime: 0.000ns
- MaterializeTupleTime: 2s132ms
- ScannerIoWaitTime: 11s641ms          <-- Dominants the total time
- TotalRawHdfsOpenFileTime: 14.501ms
- TotalRawHdfsReadTime: 1s374ms
- TotalReadThroughput: 29.94 MB/secĀ 
- TotalTime: 15s865ms{code}
After some debug, I realize the time is spent in queuing in the disk queue. If 
the scanner is consuming data faster than the disk queue threads can read, scan 
ranges will be queueing in the disk queues. The queueing time is not counted in 
either TotalRawHdfsOpenFileTime or TotalRawHdfsReadTime, but is counted in 
ScannerIoWaitTime. We should add profile counter for the queueing time on disk 
queues to better explain ScannerIoWaitTime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to