[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

GitBox Wed, 13 Oct 2021 11:54:26 -0700


sodonnel commented on a change in pull request #2723:
URL: https://github.com/apache/ozone/pull/2723#discussion_r727328851




##########
File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.
+      long basePosition = (position / stripeSize) * (long)ecChunkSize;
+      long streamPosition = basePosition + partialPosition;
+      if (streamPosition != stream.getPos()) {
+        // This ECBlockInputStream has been seeked, so the underlying
+        // block stream is no longer at the correct position. Therefore we need
+        // to seek it too.
+        stream.seek(streamPosition);
+      }

Review comment:
       No, because as it stands we don't track if we have seeked all streams or 
not. Tracking this would add some additional complexity so I figured it was 
easier to just do the check each time after a block has been seeked.
   
   The usual case would not have seeks, and if we have seeked the block at all 
then we can pay the small overhead of checking the position on each read.

##########
File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -255,8 +257,23 @@ public BlockID getBlockID() {
   private int readFromStream(BlockExtendedInputStream stream,
       ByteReaderStrategy strategy)
       throws IOException {
-    // Number of bytes left to read from this streams EC cell.
-    long ecLimit = ecChunkSize - (position % ecChunkSize);
+    long partialPosition = position % ecChunkSize;
+    if (seeked) {
+      // Seek on the underlying streams is performed lazily, as there is a
+      // possibility a read after a seek may only read a small amount of data.
+      // Once this block stream has been seeked, we always check the position,
+      // but in the usual case, where there are no seeks at all, we don't need
+      // to do this extra work.

Review comment:
       I'm not sure what the issue is with the below line?

##########
File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockInputStream.java
##########
@@ -309,8 +326,20 @@ public synchronized void unbuffer() {
   }
 
   @Override
-  public synchronized void seek(long l) throws IOException {
-    throw new NotImplementedException("Seek is not implements for EC");
+  public synchronized void seek(long pos) throws IOException {
+    checkOpen();
+    if (pos < 0 || pos >= getLength()) {
+      if (pos == 0) {
+        // It is possible for length and pos to be zero in which case
+        // seek should return instead of throwing exception
+        return;
+      }
+      throw new EOFException(
+          "EOF encountered at pos: " + pos + " for block: "
+              + blockInfo.getBlockID());
+    }
+    position = pos;

Review comment:
       Yea I guess we can add an optimization to skip seeking if we are seeking 
to the current position.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] sodonnel commented on a change in pull request #2723: HDDS-5552. EC: Implement seek on ECBlockInputStream

Reply via email to