alamb commented on issue #14816:
URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2676188552

   > We are using DataFusion to query Parquet files and wondering if the result 
of the query can be represented as a bit set of the document position (example 
below). Bit sets from the different engines can be intersected to identify the 
documents which meets the criteria. The resulting bit set then can be used to 
fetch the relevant documents from Parquet.
   
   I think there are two parts to your question:
   
   1. Representing the results as a bitset: I think you would have to imlement 
a custom "pivot" type operation that took row ids somehow and created a bitset 
from them
   
   2. Fetching only relevant documents from parquet: the curent reader is 
efficiently setup to fetch large contiguous blocks of values 
([`RowSelection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html)).
 @XiangpengHao  has been thinking about a bitset representation for selected 
rows recently so perhaps you can help contribute to making that happen in the 
parquet reader
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to