alamb commented on a change in pull request #707:
URL: https://github.com/apache/arrow-rs/pull/707#discussion_r694768683



##########
File path: parquet/src/util/io.rs
##########
@@ -36,6 +36,9 @@ pub trait TryClone: Sized {
 pub trait ParquetReader: Read + Seek + Length + TryClone {}
 impl<T: Read + Seek + Length + TryClone> ParquetReader for T {}
 
+pub trait ThreadSafeParquetReader: ParquetReader + Send + Sync + 'static {}

Review comment:
       An alternative approach, that we could use in DataFusion, would be to 
implement something like `ThreadSafeFileSource`
   
   I think the core change that is needed to work with remote s3-like object 
storages is to make / allow for an `async` api in the parquet reader. 
   
   e.g. https://github.com/apache/arrow-rs/issues/111 
   
   Here is one potential way of doing it from @jorgecarleitao 's arrow2 crate: 
https://github.com/jorgecarleitao/arrow2/pull/260




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to