[ 
https://issues.apache.org/jira/browse/PARQUET-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338946#comment-15338946
 ] 

Wes McKinney commented on PARQUET-474:
--------------------------------------

Sorry, let me be a little more specific about the problems right now

- We have code that assumes that a particular thread has exclusive access to a 
IO resource having internal state. e.g. the code snippet that uses {{Seek}}
- We are writing files in a way that assumes that IO is synchronous -- i.e. we 
are not continuing to serialize data while we are waiting for IO to complete.
- The BufferedInputStream is synchronous -- while we may not implement it in 
parquet-cpp, the design should probably allow for an input stream which buffers 
data in a background thread

I do not think we should implement a multithreaded IO scheduler in parquet-cpp 
at all right now. However, we need to be writing code so that users may 
implement subclasses of the abstract IO interfaces which may deal in 
asynchronous IO and concurrency. 

The asynchronous IO thing is a little bit thorny and out of scope for this 
JIRA. 

Does that make sense? I haven't dug through the ORC library yet -- does it 
perform IO in an asynchronous or synchronous fashion?

> InputStream and RandomAccessdSource classes are not threadsafe
> --------------------------------------------------------------
>
>                 Key: PARQUET-474
>                 URL: https://issues.apache.org/jira/browse/PARQUET-474
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Assignee: Wes McKinney
>
> We need to ensure that files can be processed in multithreaded applications



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to