[
https://issues.apache.org/jira/browse/IMPALA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845313#comment-16845313
]
Sahil Takiar commented on IMPALA-8544:
--------------------------------------
Thanks for the input everyone.
It sounds like there is some additional effort to expose these metrics on a
per-query basis. As a first step, I think there are several S3A metrics that
would make sense to expose on an impalad process-level (rather than a per query
basis). My guess is that metrics like "store_io_throttled" and
"s3guard_metadatastore_throttled" are likely to be most relevant process wide.
For example, given an impalad running five concurrent queries against the same
bucket, if one of the queries has a high value for "store_io_throttled" its
likely that the rest of the queries do as well. I can't remember where I read
it, but I think S3 throttling is done on a per-bucket basis? If thats the case,
all queries against a bucket will likely be throttled at the same time. So the
ability to see the value of "store_io_throttled" on an impalad process-level is
probably sufficient to notice this. I think the story for S3Guard is similar,
since you specify a maximum request rate per second for a single table.
Impala already has support for exposing process wide metrics, so implementing
this using the exposed {{StorageStatistics}} should be straightforward. The
only issue I see is that {{StorageStatistics}} are aggregated globally, but I'm
curious if we have considered aggregating them on a per-bucket granularity
instead?
> Expose additional S3A / S3Guard metrics
> ---------------------------------------
>
> Key: IMPALA-8544
> URL: https://issues.apache.org/jira/browse/IMPALA-8544
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Labels: s3
>
> S3A / S3Guard internally collects several useful metrics that we should
> consider exposing to Impala users. The full list of statistics can be found
> in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations
> performed (put, get, etc.), invocation counts for various {{FileSystem}}
> methods, stream statistics (bytes read, written, etc.), etc.
> Some interesting stats that stand out:
> * "stream_aborted": "Count of times the TCP stream was aborted" - the number
> of TCP connection aborts, a high value would indicate performance issues
> * "stream_read_exceptions" : "Number of exceptions invoked on input streams"
> - incremented whenever an {{IOException}} is caught while reading (these
> exception don't always get propagated to Impala because they trigger a retry)
> * "store_io_throttled": "Requests throttled and retried" - looks like it
> tracks the number of times the fs retries an operation because the original
> request hit a throttling exception
> * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" -
> looks like it tracks the number of times the fs retries S3Guard operations
> * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled
> events" - similar to "store_io_throttled" but looks like it is specific to
> S3Guard
> We should consider how to expose these metrics via Impala logs / runtime
> profiles.
> There are a few options:
> * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard
> via the {{FileSystem#getStorageStatistics}} method; the
> {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics,
> however, I think the stats might be aggregated globally, which would make it
> hard to create per-query specific metrics
> * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it
> is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}}
> extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some
> API (haven't looked into this yet)
> * {{S3AInputStream#toString}} dumps the statistics from
> {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and
> {{S3AFileSystem#toString}} dumps them all as well
> * {{S3AFileSystem}} updates the stats in
> {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten,
> etc.)
> Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared
> across threads.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]