asfimport commented on issue #42257: URL: https://github.com/apache/arrow/issues/42257#issuecomment-2184204291
[Wes McKinney](https://issues.apache.org/jira/browse/PARQUET-474?#comment-15338946) / @wesm: Sorry, let me be a little more specific about the problems right now - We have code that assumes that a particular thread has exclusive access to a IO resource having internal state. e.g. the code snippet that uses `Seek` - We are writing files in a way that assumes that IO is synchronous – i.e. we are not continuing to serialize data while we are waiting for IO to complete. - The BufferedInputStream is synchronous – while we may not implement it in parquet-cpp, the design should probably allow for an input stream which buffers data in a background thread I do not think we should implement a multithreaded IO scheduler in parquet-cpp at all right now. However, we need to be writing code so that users may implement subclasses of the abstract IO interfaces which may deal in asynchronous IO and concurrency. The asynchronous IO thing is a little bit thorny and out of scope for this JIRA. Does that make sense? I haven't dug through the ORC library yet – does it perform IO in an asynchronous or synchronous fashion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
