dongjoon-hyun commented on a change in pull request #695:
URL: https://github.com/apache/orc/pull/695#discussion_r627086536



##########
File path: c++/src/Compression.cc
##########
@@ -503,16 +532,47 @@ DIAGNOSTIC_PUSH
     return true;
   }
 
+  /** There are three possible scenarios when seeking a position:
+   * 1. The seeked position is already read and decompressed into
+   *    the output stream.
+   * 2. It is already read from the input stream, but has not been
+   *    decompressed yet, ie. it's not in the output stream.
+   * 3. It is not read yet from the inputstream.
+   */
   void DecompressionStream::seek(PositionProvider& position) {
-    // clear state to force seek to read from the right position
+    size_t seekedPosition = position.current();
+    // Case 3.: the seeked position is the one that is currently buffered and
+    // decompressed. Here we only need to set the output buffer's pointer to 
the
+    // seeked position. Note that after the headerPosition comes the 3 bytes of
+    // the header.
+    if (headerPosition == seekedPosition
+        && inputBufferStartPosition <= headerPosition + 3 && inputBufferStart) 
{
+      position.next(); // Skip the input level position.
+      size_t posInChunk = position.next(); // Chunk level position.
+      outputBufferLength = uncompressedBufferLength - posInChunk;
+      outputBuffer = outputBufferStart + posInChunk;
+      return;
+    }
+    // Clear state to prepare reading from a new chunk header.
     state = DECOMPRESS_HEADER;
     outputBuffer = nullptr;
     outputBufferLength = 0;
     remainingLength = 0;
-    inputBuffer = nullptr;
-    inputBufferEnd = nullptr;
-
-    input->seek(position);
+    if (seekedPosition < static_cast<uint64_t>(input->ByteCount()) &&
+        seekedPosition >= inputBufferStartPosition) {
+      // Case 2.: The input is buffered, but not yet decompressed. No need to
+      // force re-reading the inputBuffer, we just have to move it to the
+      // seeked position.
+      position.next(); // Skip the input level position.
+      inputBuffer
+        = inputBufferStart + (seekedPosition - inputBufferStartPosition);
+    } else {
+      // Case 1.: The seeked position is not in the input buffer, here we are

Review comment:
       Ditto. This looks like `3. It is not read yet from the inputstream.`. 
   Switching from `Case 1.:` to `Case 3:` will be better. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to