mapleFU commented on issue #37559:
URL: https://github.com/apache/arrow/issues/37559#issuecomment-1711023671

   Yeah. I think during read. The original logic is:
   
   * for column c1...cn, trying to read them, this might including decompress, 
decode, memcpy...
   * Build arrow `RecordBatch` above the decoded columns
   
   When it turns to filter pushdown, we might need some Late materialization 
techniques. This might change the procedure to:
   
   * read column c1, filtering on c1
   * using the selector to read the remaing columns
   * Build arrow `RecordBatch` above the decoded columns
   
   [1] https://issues.apache.org/jira/browse/SPARK-36527
   [2] 
https://docs.cloudera.com/cdw-runtime/cloud/impala-reference/topics/impala-lazy-materialization.html
   
   The link above uses this technique. Note that I guess it not always improve 
the CPU performance. e.g:
   
   For filter output like. `0 1 0 1...`, it's not easy to make use of these 
filter to save CPU time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to