[jira] [Work logged] (HADOOP-18190) s3a prefetching streams to collect iostats on prefetching operations

ASF GitHub Bot (Jira) Mon, 18 Jul 2022 11:12:10 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-18190?focusedWorklogId=792296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792296
 ]


ASF GitHub Bot logged work on HADOOP-18190:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jul/22 18:11
            Start Date: 18/Jul/22 18:11
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on code in PR #4458:
URL: https://github.com/apache/hadoop/pull/4458#discussion_r923663742


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/statistics/StreamStatisticNames.java:
##########
@@ -387,6 +387,46 @@ public final class StreamStatisticNames {
   public static final String BLOCKS_RELEASED
       = "blocks_released";
 
+  /**
+   * Total number of prefetching operations executed.
+   */
+  public static final String STREAM_READ_PREFETCH_OPERATIONS
+      = "stream_read_prefetch_operations";
+
+  /**
+   * Total number of block in disk cache.
+   */
+  public static final String STREAM_READ_BLOCKS_IN_FILE_CACHE
+      = "stream_read_blocks_in_cache";
+
+  /**
+   * Total number of active prefetch operations.
+   */
+  public static final String STREAM_READ_ACTIVE_PREFETCH_OPERATIONS
+      = "stream_read_active_prefetch_operations";
+
+  /**
+   * Total bytes of memory in use by this input stream.
+   */
+  public static final String STREAM_READ_ACTIVE_MEMORY_IN_USE
+      = "stream_read_active_memory_in_use";
+
+  /**
+   * count/duration of reading a remote block.
+   * IO.

Review Comment:
   why the IO here and below?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/CachingBlockManager.java:
##########
@@ -330,6 +341,10 @@ private void readBlock(BufferData data, boolean 
isPrefetch, BufferData.State...
         this.read(buffer, offset, size);
         buffer.flip();
         data.setReady(expectedState);
+
+        if(isPrefetch) {

Review Comment:
   nit, spacing



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BufferPool.java:
##########
@@ -236,11 +244,15 @@ public synchronized void close() {
       }
     }
 
+    int currentPoolSize = this.pool.numCreated();

Review Comment:
   how about you cut the `this.` here and from the existing calls in the same 
method. not our project's style



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/CachingBlockManager.java:
##########
@@ -350,16 +365,18 @@ private void readBlock(BufferData data, boolean 
isPrefetch, BufferData.State...
   private static class PrefetchTask implements Supplier<Void> {
     private final BufferData data;
     private final CachingBlockManager blockManager;
+    private final Instant taskQueuedStartTime;
 
-    PrefetchTask(BufferData data, CachingBlockManager blockManager) {
+    PrefetchTask(BufferData data, CachingBlockManager blockManager, Instant 
taskQueuedStartTime) {
       this.data = data;
       this.blockManager = blockManager;
+      this.taskQueuedStartTime = taskQueuedStartTime;
     }
 
     @Override
     public Void get() {
       try {
-        this.blockManager.prefetch(data);
+        this.blockManager.prefetch(data, taskQueuedStartTime);
       } catch (Exception e) {
         LOG.error("error during prefetch", e);

Review Comment:
   i worry about this. seen problems with transient network failures and abfs 
where the logs are flooded with many parallel stack traces before it recovers.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/CachingBlockManager.java:
##########
@@ -330,6 +341,10 @@ private void readBlock(BufferData data, boolean 
isPrefetch, BufferData.State...
         this.read(buffer, offset, size);
         buffer.flip();
         data.setReady(expectedState);
+
+        if(isPrefetch) {
+          this.prefetchingStatistics.prefetchOperationCompleted();

Review Comment:
   what happens if the read failed? is the failure count and duration before 
failure recorded separately from the successful calls?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/common/BufferPool.java:
##########
@@ -56,26 +58,32 @@ public class BufferPool implements Closeable {
   // Allows associating metadata to each buffer in the pool.
   private Map<BufferData, ByteBuffer> allocated;
 
+  private PrefetchingStatistics prefetchingStatistics;
+
   /**
    * Initializes a new instance of the {@code BufferPool} class.
    *
    * @param size number of buffer in this pool.
    * @param bufferSize size in bytes of each buffer.
+   * @param prefetchingStatistics statistics for this stream.
    *
    * @throws IllegalArgumentException if size is zero or negative.
    * @throws IllegalArgumentException if bufferSize is zero or negative.
    */
-  public BufferPool(int size, int bufferSize) {
+  public BufferPool(int size, int bufferSize, PrefetchingStatistics 
prefetchingStatistics) {

Review Comment:
   we are going to have to rename this class; 
org.apache.hadoop.io.ByteBufferPool has long existed and it will only cause 
confusion. not in this PR, but soon





Issue Time Tracking
-------------------

    Worklog Id:     (was: 792296)
    Time Spent: 1.5h  (was: 1h 20m)

> s3a prefetching streams to collect iostats on prefetching operations
> --------------------------------------------------------------------
>
>                 Key: HADOOP-18190
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18190
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There is a lot more happening in reads, so there's a lot more data to collect 
> and publish in IO stats for us to view in a summary at the end of processes 
> as well as get from the stream while it is active.
> Some useful ones would seem to be:
> counters
>  * is in memory. using 0 or 1 here lets aggregation reports count total #of 
> memory cached files.
>  * prefetching operations executed
>  * errors during prefetching
> gauges
>  * number of blocks in cache
>  * total size of blocks
>  * active prefetches
> + active memory used
> duration tracking count/min/max/ave
>  * time to fetch a block
>  * time queued before the actual fetch begins
>  * time a reader is blocked waiting for a block fetch to complete
> and some info on cache use itself
>  * number of blocks discarded unread
>  * number of prefetched blocks later used
>  * number of backward seeks to a prefetched block
>  * number of forward seeks to a prefetched block
> the key ones I care about are
>  # memory consumption
>  # can we determine if cache is working (reads with cache hit) and when it is 
> not (misses, wasted prefetches)
>  # time blocked on executors
> The stats need to be accessible on a stream even when closed, and aggregated 
> into the FS. once we get per-thread stats contexts we can publish there too 
> and collect in worker threads for reporting in task commits



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-18190) s3a prefetching streams to collect iostats on prefetching operations

Reply via email to