jorgecarleitao edited a comment on issue #1059:
URL: https://github.com/apache/arrow-rs/issues/1059#issuecomment-997172436
Sorry that I did not express myself very well. I meant something like this:
```rust
use std::io::Read;
struct ReaderA<R: Read> {
pub reader: R,
position: Option<u64>,
pub buffer: Vec<u8>,
}
impl<R: Read> Read for ReaderA<R> {
#[inline]
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let length = buf.len();
if let Some(position) = &mut self.position {
let start = *position as usize;
// we have seeked to somewhere in our buffer. Read from buffer
first
if start + length < self.buffer.len() {
// our buffer fills `buf` => memcopy all
buf.copy_from_slice(&self.buffer[start..start + length]);
*position += length;
Ok(length)
} else if start <= self.buffer.len() {
// edge case where the read covers `self.buffer` and
`self.reader`:
// read from both accordingly
let buffer_remaining = self.buffer.len() - start;
(&mut buf[..buffer_remaining])
.copy_from_slice(&self.buffer[start..start +
buffer_remaining]);
// read the remaining from the reader
let read = self.reader.read(&mut buf[buffer_remaining..])?;
*position += (buffer_remaining + read) as u64;
if *position > self.buffer.len() {
// release memory
std::mem::swap(&mut self.buffer, &mut vec![]);
}
Ok(buffer_remaining + read)
} else {
// we are past `self.buffer`,
self.reader.read(buf)
}
} else {
// no seek was done so far, read to the buffer
let start = self.buffer.len();
self.buffer.extend(std::iter::repeat(0).take(length));
let read = self.reader.read(&mut self.buffer[start..start +
length])?;
(&mut buf).copy_from_slice(&self.buffer[start..start + length]);
Ok(read)
}
}
}
impl<R: Read> std::io::Seek for ReaderA<R> {
#[inline]
fn seek(&mut self, pos: std::io::SeekFrom) -> std::io::Result<u64> {
match pos {
std::io::SeekFrom::Start(position) => {
self.position = position;
Ok(self.position)
}
std::io::SeekFrom::End(_) => panic!("This reader does not
support seeking from end"),
std::io::SeekFrom::Current(position) => {
self.position += position;
Ok(self.position)
}
}
}
}
fn main() {}
```
I.e. when we do not have seek and need to use data from a `Read` more than
once, we can store it in a buffer and use it whenever it is requested (via
`seek` to the back + `read`). This idiom uses as little memory as we need to
store in a non-seek environment, namely the data that needs to be used twice
(once for inference, one for reading).
I hope this is a bit more understandable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]