baibaichen commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1711044056
From the paper, IIUC, they copy **the selected encode value** out, and then decode them as if all records are decoded. The following cited from section 4.1 > The framework is built upon a simple yet crucial observation: when performing a `filter` or `project` operation, records failing to meet prior predicates can be bypassed directly. While this observation is undeniably obvious, previous approaches have not leveraged it effectively. Indeed, in the case of filter operations, previous work tends to perform predicate evaluation on **all** values [[29](#_bookmark74), [34](#_bookmark80)], intentionally ignoring the fact that some values might have been filtered by prior filters. **This is primarily because the additional cost associated with the select operator often outweighs the potential savings in predicate evaluation**. However, given the fast select operator that operates on encoded values (Section [3](#_bookmark6)), it has become more favorable to select values upfront, even for filter operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
