[ 
https://issues.apache.org/jira/browse/IMPALA-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9819:
----------------------------------
    Labels: observability  (was: )

> Separate data cache and HDFS scan node runtime profile metrics
> --------------------------------------------------------------
>
>                 Key: IMPALA-9819
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9819
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Joe McDonnell
>            Priority: Major
>              Labels: observability
>
> When a query reads data from both a remote storage system (e.g. S3) and the 
> data cache, the HDFS_SCAN_NODE runtime profiles are hard to reason about.
> For example, in the following runtime profile snippet:
> {code:java}
> HDFS_SCAN_NODE (id=0):(Total: 59s374ms, non-child: 0.000ns, % non-child: 
> 0.00%)
>          - AverageHdfsReadThreadConcurrency: 0.62 
>          - AverageScannerThreadConcurrency: 0.91 
>          - BytesRead: 587.97 MB (616533483)
>          - BytesReadDataNodeCache: 0
>          - BytesReadLocal: 0
>          - BytesReadRemoteUnexpected: 0
>          - BytesReadShortCircuit: 0
>          - CachedFileHandlesHitCount: 323 (323)
>          - CachedFileHandlesMissCount: 94 (94)
>          - CollectionItemsRead: 0 (0)
>          - DataCacheHitBytes: 212.00 MB (222294996)
>          - DataCacheHitCount: 107 (107)
>          - DataCacheMissBytes: 375.98 MB (394238486)
>          - DataCacheMissCount: 310 (310)
>          - DataCachePartialHitCount: 0 (0)
>          - DecompressionTime: 2s428ms
>          - MaterializeTupleTime: 19s444ms
>          - MaxCompressedTextFileLength: 0
>          - NumColumns: 3 (3)
>          - NumDictFilteredRowGroups: 0 (0)
>          - NumDisksAccessed: 1 (1)
>          - NumPages: 53.30K (53300)
>          - NumRowGroups: 83 (83)
>          - NumRowGroupsWithPageIndex: 83 (83)
>          - NumScannerThreadMemUnavailable: 0 (0)
>          - NumScannerThreadReservationsDenied: 0 (0)
>          - NumScannerThreadsStarted: 1 (1)
>          - NumScannersWithNoReads: 0 (0)
>          - NumStatsFilteredPages: 0 (0)
>          - NumStatsFilteredRowGroups: 0 (0)
>          - PeakMemoryUsage: 16.00 MB (16781312)
>          - PeakScannerThreadConcurrency: 1 (1)
>          - PerReadThreadRawHdfsThroughput: 15.11 MB/sec
>          - RemoteScanRanges: 0 (0)
>          - RowBatchBytesEnqueued: 670.68 MB (703260541)
>          - RowBatchQueueGetWaitTime: 59s368ms
>          - RowBatchQueuePeakMemoryUsage: 4.17 MB (4368285)
>          - RowBatchQueuePutWaitTime: 0.000ns
>          - RowBatchesEnqueued: 915 (915)
>          - RowsRead: 413.47M (413466507)
>          - RowsReturned: 722.27K (722275)
>          - RowsReturnedRate: 12.17 K/sec
>          - ScanRangesComplete: 83 (83)
>          - ScannerIoWaitTime: 33s454ms
>          - ScannerThreadWorklessLoops: 0 (0)
>          - ScannerThreadsInvoluntaryContextSwitches: 1.94K (1940)
>          - ScannerThreadsTotalWallClockTime: 1m
>            - ScannerThreadsSysTime: 1s181ms
>            - ScannerThreadsUserTime: 20s581ms
>          - ScannerThreadsVoluntaryContextSwitches: 770 (770)
>          - TotalRawHdfsOpenFileTime: 3s396ms
>          - TotalRawHdfsReadTime: 38s940ms
>          - TotalReadThroughput: 8.86 MB/sec {code}
> The query scanned part of the data from S3 and part of the data from the data 
> cache.
> The confusing part is that metrics such as PerReadThreadRawHdfsThroughput are 
> measured across S3 and data cache reads. So there is no straightforward way 
> to determine the throughput for *just* S3 reads. Users might want this value 
> to determine if S3 was particularly slow for their query.
> It would be nice if the scan node metrics more clearly differentiate between 
> reads from S3 vs. the data cache. The aggregate metrics (*Total* metrics) are 
> still useful, but it would be useful to have fine-grained metrics that are 
> specific to a data storage system (e.g. either the data cache or S3).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to