[I] Advanced example for building an external index for Row Groups within parquet files [datafusion]

via GitHub Mon, 20 May 2024 08:56:26 -0700


alamb opened a new issue, #10580:
URL: https://github.com/apache/datafusion/issues/10580


   ### Is your feature request related to a problem or challenge?
   
   It is common in databases and other analytic system to have additional 
external "indexes" (perhaps stored in the "metadata catalog", perhaps stored 
alongside the data files, perhaps embedded in the files, perhaps elsewhere)
   
   These indexes are used to speed up queries by "pruning":  specifically 
evaluating a predicate on the index and then only reading the portions of files 
that would pass the filters in the query. In 
https://github.com/apache/datafusion/issues/10546 we showed how to create a 
index for entire files.
   
   I would also like to create an example of how to create such an index for  
row groups within a file (showing how to read it without re-reading the 
metadata each time)
   
   
   To complete this example, I think we need:
   1. The API from @NGA-TRAN  in 
https://github.com/apache/datafusion/issues/10453 
   2. The API described in  https://github.com/apache/datafusion/issues/9929
   
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   This is a follow on to https://github.com/apache/datafusion/issues/10546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Advanced example for building an external index for Row Groups *within* parquet files [datafusion]

Reply via email to

[I] Advanced example for building an external index for Row Groups within parquet files [datafusion]