[ 
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541725#comment-17541725
 ] 

ASF GitHub Bot commented on PARQUET-2149:
-----------------------------------------

steveloughran commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136465506

   > At this point the bottlenecks in parquet begin to move towards 
decompression and decoding but IO remains the slowest link in the chain.
   
   Latency is the killer; in an HTTP request you want read enough but not 
discard data or break an http connection if the client suddenly does a seek() 
or readFully() somewhere else. file listings, existence checks etc.
   
   > One thing we get with my PR is that the ParquetFileReader had assumptions 
built in that all data must be read before downstream can proceed. Some of my 
changes are related to removing these assumptions and ensuring that downstream 
processing does not block until an entire column is read so we get efficient 
pipelining.
   
   That'd be great. now, if you could also handle requesting different columns 
in parallel and processing them out of order.
   
   > What does the 128 MB block mean? Is this the amount prefetched for a 
stream? The read API does not block until the entire block is filled, I presume.
   
   this was the abfs client set to do four GET requests of 128MB each. this 
would be awful for columns stores where smaller ranges are often 
requested/processed before another seek is made, but quite often parquet does 
do more back to back reads than just one read/readFully request
   
   > With my PR, parquet IO is reading 8MB at a time (default) and downstream 
is processing 1MB at a time (default) and several such streams (one per column) 
are in progress at the same time. Hopefully, this read pattern would work with 
the prefetch.
   
   be good to think about vectored IO.
   
   and yes, updating parquet dependencies would be good, hadoop 3.3.0 should be 
the baseline.
   
   just sketched out my thoughts on this. I've played with some of this in my 
own branch. I think the next step would be for me to look at the benchmark code 
to make it targetable elsewhere.
   
   
   
https://docs.google.com/document/d/1y9oOSYbI6fFt547zcQJ0BD8VgvJWdyHBveaiCHzk79k/
   




> Implement async IO for Parquet file reader
> ------------------------------------------
>
>                 Key: PARQUET-2149
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2149
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Parth Chandra
>            Priority: Major
>
> ParquetFileReader's implementation has the following flow (simplified) - 
>       - For every column -> Read from storage in 8MB blocks -> Read all 
> uncompressed pages into output queue 
>       - From output queues -> (downstream ) decompression + decoding
> This flow is serialized, which means that downstream threads are blocked 
> until the data has been read. Because a large part of the time spent is 
> waiting for data from storage, threads are idle and CPU utilization is really 
> low.
> There is no reason why this cannot be made asynchronous _and_ parallel. So 
> For Column _i_ -> reading one chunk until end, from storage -> intermediate 
> output queue -> read one uncompressed page until end -> output queue -> 
> (downstream ) decompression + decoding
> Note that this can be made completely self contained in ParquetFileReader and 
> downstream implementations like Iceberg and Spark will automatically be able 
> to take advantage without code change as long as the ParquetFileReader apis 
> are not changed. 
> In past work with async io  [Drill - async page reader 
> |https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java]
>  , I have seen 2x-3x improvement in reading speed for Parquet files.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to