[ 
https://issues.apache.org/jira/browse/HADOOP-18190?focusedWorklogId=792796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792796
 ]

ASF GitHub Bot logged work on HADOOP-18190:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Jul/22 16:24
            Start Date: 19/Jul/22 16:24
    Worklog Time Spent: 10m 
      Work Description: ahmarsuhail commented on PR #4458:
URL: https://github.com/apache/hadoop/pull/4458#issuecomment-1189301358

   @steveloughran by 
   
   > can you split success/failure logging of the invocation and duration of 
calls
   
   do you mean to add in stats for number of failed prefetch ops & duration of 
this failure? for duration, I couldn't figure out how to measure failure..for 
example, the duration of reading from S3 is measured 
[here](https://github.com/apache/hadoop/pull/4458/files#diff-79d7c6565dcf3633d045b1222349326646bfa722d8441ca1e9939b72df38161cR109),
 if the operation fails, the duration tracker will call `tracker.failed();`. 
   
   1) What does tracker.failed() do?
   2) how should this be changed to measure duration of a failure?
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 792796)
    Time Spent: 1h 50m  (was: 1h 40m)

> s3a prefetching streams to collect iostats on prefetching operations
> --------------------------------------------------------------------
>
>                 Key: HADOOP-18190
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18190
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is a lot more happening in reads, so there's a lot more data to collect 
> and publish in IO stats for us to view in a summary at the end of processes 
> as well as get from the stream while it is active.
> Some useful ones would seem to be:
> counters
>  * is in memory. using 0 or 1 here lets aggregation reports count total #of 
> memory cached files.
>  * prefetching operations executed
>  * errors during prefetching
> gauges
>  * number of blocks in cache
>  * total size of blocks
>  * active prefetches
> + active memory used
> duration tracking count/min/max/ave
>  * time to fetch a block
>  * time queued before the actual fetch begins
>  * time a reader is blocked waiting for a block fetch to complete
> and some info on cache use itself
>  * number of blocks discarded unread
>  * number of prefetched blocks later used
>  * number of backward seeks to a prefetched block
>  * number of forward seeks to a prefetched block
> the key ones I care about are
>  # memory consumption
>  # can we determine if cache is working (reads with cache hit) and when it is 
> not (misses, wasted prefetches)
>  # time blocked on executors
> The stats need to be accessible on a stream even when closed, and aggregated 
> into the FS. once we get per-thread stats contexts we can publish there too 
> and collect in worker threads for reporting in task commits



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to