[ 
https://issues.apache.org/jira/browse/FLINK-30450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649357#comment-17649357
 ] 

Anton Ippolitov commented on FLINK-30450:
-----------------------------------------

Thank you for creating this ticket! To expand on what I mentioned in the 
original [email 
thread,|https://lists.apache.org/thread/dpgh6sh0r21sgohjxxbqtm2mrmjdolgr] we'd 
like to be able to see the following metrics:
 * Request rate (if possible tagged by HTTP method)
 * Request latency
 * Upload / download byte rates
 * Error rate (if possible tagged by error) - would be useful to track 
throttling errors from S3 for example
 * Retry count
 * Number of active connections

As mentioned in the thread, the S3 Presto client already gathers these metrics 
[here|https://github.com/prestodb/presto/blob/0.272/presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystemStats.java]
 but they are not exposed anywhere in Flink. The S3A client also has built-in 
[metrics|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Metrics]
 that are actually already exposed via JMX when the client is used in Flink but 
it would obviously be great to standardize the way we expose FS metrics on the 
Flink side.

I haven't looked into GCS or Azure Storage yet but definitely interested in 
metrics from these clients too.

> FileSystem supports exporting client-side metrics
> -------------------------------------------------
>
>                 Key: FLINK-30450
>                 URL: https://issues.apache.org/jira/browse/FLINK-30450
>             Project: Flink
>          Issue Type: New Feature
>          Components: FileSystems
>            Reporter: Hangxiang Yu
>            Priority: Major
>
> Client-side metrics, or job level metrics for filesystem could help us to 
> monitor filesystem more precisely.
> Some metrics (like request rate , throughput, latency, retry count, etc) are 
> useful to monitor the network or client problem of checkpointing or other 
> access cases for a job.  
> Some filesystems like s3, s3-presto, gs have supported enabling some metrics, 
> these could be exported in the filesystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to