[
https://issues.apache.org/jira/browse/PARQUET-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338946#comment-15338946
]
Wes McKinney commented on PARQUET-474:
--------------------------------------
Sorry, let me be a little more specific about the problems right now
- We have code that assumes that a particular thread has exclusive access to a
IO resource having internal state. e.g. the code snippet that uses {{Seek}}
- We are writing files in a way that assumes that IO is synchronous -- i.e. we
are not continuing to serialize data while we are waiting for IO to complete.
- The BufferedInputStream is synchronous -- while we may not implement it in
parquet-cpp, the design should probably allow for an input stream which buffers
data in a background thread
I do not think we should implement a multithreaded IO scheduler in parquet-cpp
at all right now. However, we need to be writing code so that users may
implement subclasses of the abstract IO interfaces which may deal in
asynchronous IO and concurrency.
The asynchronous IO thing is a little bit thorny and out of scope for this
JIRA.
Does that make sense? I haven't dug through the ORC library yet -- does it
perform IO in an asynchronous or synchronous fashion?
> InputStream and RandomAccessdSource classes are not threadsafe
> --------------------------------------------------------------
>
> Key: PARQUET-474
> URL: https://issues.apache.org/jira/browse/PARQUET-474
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Wes McKinney
> Assignee: Wes McKinney
>
> We need to ensure that files can be processed in multithreaded applications
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)