[GitHub] [orc] pavibhai opened a new pull request #668: ORC-742: Added Lazy IO for non filter columns

GitBox Wed, 24 Mar 2021 10:32:41 -0700


pavibhai opened a new pull request #668:
URL: https://github.com/apache/orc/pull/668



   ### What changes were proposed in this pull request?
       * Identify columns in the presence of a filter into LEAD and FOLLOW 
columns
               * LEAD columns are read first
               * FOLLOW columns are read only if the filter selects an output
       * RecordReaderImpl.nextBatch performs read until a batch has value or 
the file is exhaused instead of returning empty batches as was the case 
previously
       * IO of FOLLOW columns happens at the level of RowGroup
       * In the presence of filters batches respected row group boundaries
   
   ### Why are the changes needed?
   The code changes allow for a lazy evaluation of FOLLOW columns, which in the 
case of reads with minimal hits gives substantial savings both of IO and CPU.
   
   ### How was this patch tested?
   This patch includes Unit tests that verify the IO savings accomplished as a 
result of this change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] pavibhai opened a new pull request #668: ORC-742: Added Lazy IO for non filter columns

Reply via email to