nevi-me commented on pull request #7309:
URL: https://github.com/apache/arrow/pull/7309#issuecomment-636864911


   > I agree that placing the burden on the user is a bad idea. However, there 
are situations where we just can't seek back to start (s3 is one example). 
Maybe a specific implementation for `Seek + Read`, that would do the seek back 
to start, and one for `Read` only, that would not. However... this would need 
the use of specialization, so more nightly dependencies.
   
   Okay, in that case I could support not seeking back to the start of the 
input. One downside though is that in the case where no schema is supplied, and 
the readers (csv and json) infer the schema, we do need to reset the input to 
its starting position. I've briefly looked at the csv code, so if it's doable, 
we could find a solution.
   
   Regarding specialization, we are already dependent on it, with low 
likelihood of this changing soon; so I'd say it's an option.
   
   With regards to `dyn Read`, please have a look at the `arrow::csv::reader` 
code. We already support a `BufReader<R: Read>`, so I think we can implement 
the same without boxing the reader in the way that you've done so far. The 
only/primary reason why we still use `File` in `arrow::json` is that nobody's 
needed to use something else, or at least raised the issue.
   ___
   
   One dramatic alternative would be to always require a schema, and leave 
inference to the user. We could then consume the buffer reader (`reader: mut 
BufReader<R>` instead of `reader: &mut BufReader<R>`), so that we don't leave 
the user with a file handle that's already partially/fully consumed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to