sodonnel commented on code in PR #10415:
URL: https://github.com/apache/ozone/pull/10415#discussion_r3382288985
##########
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java:
##########
@@ -208,12 +214,40 @@ public synchronized void seek(long pos) throws
IOException {
if (pos == position) {
return;
}
- LOG.debug("{}: seek {} -> {}", this, position, pos);
- closeStream();
+ LOG.debug("{}: seek {} -> {}", getName(streamingReader), position, pos);
+ readBuffer = reuseReadBuffer(readBuffer, pos);
Review Comment:
I've attached a diff that has a unit test to validate that we don't re-read
data already on the queue when seeking forward. I tried to create a similar
test for a backwards seek but its more difficult as we want to test that the
queue is drained before requesting more data, which isn't really possible with
the current structure.
But I think we need to change the reading behavior so it reads through the
queue and only reads more data from the datanode if the queue has been drained.
In a dysfunctional case, we could have read 32MB onto the queue with
pre-read. Seek from 0.1MB to 1.1MB. This would trigger another 32MB read onto
the end of the queue, but the data we want is there are the head. Now there is
31 + 32MB on the queue. Seek now to 3.1MB, the same happens, as we have the
next chunk we need available.
Even with the backwards seek, we could have 32MB on the queue, and then read
32MB more before throwing away the first 32MB. We should throw away the first
32MB first.
[forward-seek-test.patch](https://github.com/user-attachments/files/28763628/forward-seek-test.patch)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]