[GitHub] [hadoop] mehakmeet commented on a change in pull request #3927: HADOOP-15245. S3AInputStream.skip() to use lazy seek

GitBox Mon, 14 Mar 2022 23:58:13 -0700


mehakmeet commented on a change in pull request #3927:
URL: https://github.com/apache/hadoop/pull/3927#discussion_r826616149




##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
##########
@@ -781,6 +781,46 @@ public void readFully(long position, byte[] buffer, int 
offset, int length)
     }
   }
 
+  /**
+   * {@inheritDoc}
+   *
+   * This implements a more efficient method for skip. It calls lazy seek
+   * which will either make a new get request or do a default skip.
+   * If lazy seek fails, try doing a default skip.
+   *
+   * @param n Number of bytes to be skipped
+   * @return Number of bytes skipped
+   * @throws IOException on any problem
+   */
+  @Override
+  @Retries.OnceTranslated
+  public long skip(final long n) throws IOException {
+
+    if (n <= 0) {
+      return 0;
+    }
+
+    checkNotClosed();
+    streamStatistics.skipOperationStarted();
+
+    long targetPos = pos + n;

Review comment:
       use getPos() instead of pos.

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
##########
@@ -781,6 +781,46 @@ public void readFully(long position, byte[] buffer, int 
offset, int length)
     }
   }
 
+  /**
+   * {@inheritDoc}
+   *
+   * This implements a more efficient method for skip. It calls lazy seek
+   * which will either make a new get request or do a default skip.
+   * If lazy seek fails, try doing a default skip.
+   *
+   * @param n Number of bytes to be skipped
+   * @return Number of bytes skipped
+   * @throws IOException on any problem
+   */
+  @Override
+  @Retries.OnceTranslated
+  public long skip(final long n) throws IOException {
+
+    if (n <= 0) {
+      return 0;
+    }
+
+    checkNotClosed();
+    streamStatistics.skipOperationStarted();
+
+    long targetPos = pos + n;
+    long skipped;
+
+    try {
+      lazySeek(targetPos, 1);
+      skipped = n;

Review comment:
       What would happen if we have already seeked or read some bits of the 
file, then we try to skip even further than the fileSize, we are returning the 
number of bytes we want to skip and not the number of bytes we actually skipped.

##########
File path: 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
##########
@@ -467,6 +467,35 @@ public void testRandomIORandomPolicy() throws Throwable {
         0, streamStatistics.getAborted());
   }
 
+  @Test
+  public void testSkip() throws Throwable {

Review comment:
       Nice test to show the functionality. We could have a little more 
verification that we actually skipped bytes by asserting the content read after 
skipping "n" bytes. As well as adding seeks/reads in between 2 skips to know 
that we are skipping from the current position.

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
##########
@@ -781,6 +781,46 @@ public void readFully(long position, byte[] buffer, int 
offset, int length)
     }
   }
 
+  /**
+   * {@inheritDoc}
+   *
+   * This implements a more efficient method for skip. It calls lazy seek
+   * which will either make a new get request or do a default skip.
+   * If lazy seek fails, try doing a default skip.
+   *
+   * @param n Number of bytes to be skipped
+   * @return Number of bytes skipped
+   * @throws IOException on any problem
+   */
+  @Override
+  @Retries.OnceTranslated
+  public long skip(final long n) throws IOException {
+
+    if (n <= 0) {
+      return 0;
+    }
+
+    checkNotClosed();
+    streamStatistics.skipOperationStarted();
+
+    long targetPos = pos + n;
+    long skipped;
+
+    try {
+      lazySeek(targetPos, 1);
+      skipped = n;
+    } catch (EOFException e) {

Review comment:
       Maybe we can also LOG the reason for the failure of LazySeek?

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java
##########
@@ -781,6 +781,46 @@ public void readFully(long position, byte[] buffer, int 
offset, int length)
     }
   }
 
+  /**
+   * {@inheritDoc}
+   *
+   * This implements a more efficient method for skip. It calls lazy seek
+   * which will either make a new get request or do a default skip.
+   * If lazy seek fails, try doing a default skip.
+   *
+   * @param n Number of bytes to be skipped
+   * @return Number of bytes skipped
+   * @throws IOException on any problem
+   */
+  @Override
+  @Retries.OnceTranslated
+  public long skip(final long n) throws IOException {
+
+    if (n <= 0) {
+      return 0;
+    }
+
+    checkNotClosed();
+    streamStatistics.skipOperationStarted();
+
+    long targetPos = pos + n;

Review comment:
       What if targetPos is larger than fileSize? We can either limit the 
targetPos to fileSize or return quickly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] mehakmeet commented on a change in pull request #3927: HADOOP-15245. S3AInputStream.skip() to use lazy seek

Reply via email to