asfimport commented on issue #42257:
URL: https://github.com/apache/arrow/issues/42257#issuecomment-2184204291

   [Wes 
McKinney](https://issues.apache.org/jira/browse/PARQUET-474?#comment-15338946) 
/ @wesm:
   Sorry, let me be a little more specific about the problems right now
   
   - We have code that assumes that a particular thread has exclusive access to 
a IO resource having internal state. e.g. the code snippet that uses `Seek`
   - We are writing files in a way that assumes that IO is synchronous – i.e. 
we are not continuing to serialize data while we are waiting for IO to complete.
   - The BufferedInputStream is synchronous – while we may not implement it in 
parquet-cpp, the design should probably allow for an input stream which buffers 
data in a background thread
   
   I do not think we should implement a multithreaded IO scheduler in 
parquet-cpp at all right now. However, we need to be writing code so that users 
may implement subclasses of the abstract IO interfaces which may deal in 
asynchronous IO and concurrency. 
   
   The asynchronous IO thing is a little bit thorny and out of scope for this 
JIRA. 
   
   Does that make sense? I haven't dug through the ORC library yet – does it 
perform IO in an asynchronous or synchronous fashion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to