[
https://issues.apache.org/jira/browse/PARQUET-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334441#comment-15334441
]
Wes McKinney commented on PARQUET-474:
--------------------------------------
Took a look at this. Reading a range of bytes from a local file must be made an
atomic operation so that we can lock the source while performing a seek then
read. Currently we have code like
{code}
source_->Seek(filesize - FOOTER_SIZE);
int64_t bytes_read = source_->Read(FOOTER_SIZE, footer_buffer);
{code}
This can be made thread-safe by having an API like
{code}
source_->ReadAt(filesize - FOOTER_SIZE, FOOTER_SIZE, footer_buffer);
{code}
We already have
{code}
std::shared_ptr<Buffer> ReadAt(int64_t pos, int64_t nbytes);
{code}
in which we can also block other threads (as needed) while performing the read.
The stream classes are slightly different rabbit hole -- to get the best
performance we'd want to have a buffered stream that continues to buffer data
from the source in a background thread (presently it is synchronous / on-demand
buffering)
> InputStream and RandomAccessdSource classes are not threadsafe
> --------------------------------------------------------------
>
> Key: PARQUET-474
> URL: https://issues.apache.org/jira/browse/PARQUET-474
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Wes McKinney
> Assignee: Wes McKinney
>
> We need to ensure that files can be processed in multithreaded applications
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)