GitHub user ilsley added a comment to the discussion: Seek implementation has 
unexpected behaviour

Thank you for considering this.

I have a summary pargraph at the end, but here is some detail should you find 
it useful.

I am parsing selected subsets of large binary files using binrw. The binary 
file has parts of variable size, which refer to other parts of the file, so I 
need to do it incrementally i.e. read in a chunk (with a minimum size), parse 
this (binrw needs Read and Seek), see how much more is needed, parse again if I 
have enough, otherwise, fetch again and add it to the existing buffer, and then 
parse. Throughout, I need to keep track of my position in the file since parts 
of the file refer to other parts by position.

More specifically:
- My previous implementation used a Cursor of Bytes. Before appending newly 
fetched Bytes, I would extract the position from the Cursor before creating a 
new Bytes and Cursor, and would update my offset in the file accordingly 
(because I discard the already parsed Bytes). I naively did the same when I 
tried Buffer, and it worked on most of the files I tested. However, it failed 
on a new one recently, and it took me a bit of time to find the bug, which was 
caused by needing to fetch more data plus my assumption that Seek would not 
change the length of the Buffer. 
- I can fix this error, but binrw does have directives that rely on Seek, so I 
would need to check what assumptions binrw makes regarding Seek if I go down 
this path (which does not seem necessary).

To be clear, I like the flexibility of Buffer and its overall approach, but a) 
I wanted to point out this unexpected aspect, at least for me b) I wondered 
whether it might be useful to demarcate different traits e.g. using the 
approach of bytes Buf with the reader method (e.g. 
https://docs.rs/bytes/latest/bytes/buf/struct.Reader.html) that creates a 
Reader than can be returned to the underlying type (into_inner) when necessary. 
I can see the value of implementing BufRead on Buffer directly, so I mention 
this more as an example for future traits that you might provide e.g. a 
"non-consuming" Read & Seek - should that be more generally useful.

For my purposes, I think the consume method of BufRead is clearer than a 
forward-only Seek. 

So tl;dr, I don't think the current implementation of Seek provides extra 
functionality for my use case over what's provided by BufRead. I am supposing I 
should create a contiguous Bytes and use a Cursor or similar, and it's likely 
that the performance of this will be good.

GitHub link: 
https://github.com/apache/opendal/discussions/7113#discussioncomment-15376632

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to