GitHub user PierreZ added a comment to the discussion: Indexing Support in DataFusion?
Hey everyone! 👋 Quick update, I've finally completed the initial implementation of the index provider we discussed: https://github.com/datafusion-contrib/datafusion-index-provider/pull/2 It implements the "Option 2" approach (APIs to pass additional knowledge about indexes) that @alamb mentioned above. The crate provides: - Index-based query acceleration for `TableProvider` implementations - Automatic handling of complex predicates (AND/OR/multiple indexes) - Clean trait-based API (`Index`, `RecordFetcher`, `IndexedTableProvider`) This has been running at my company for a few months without issues on top of FoundationDB. The design is somewhat oriented toward small queries and low data volumes due to FoundationDB's 5s transaction timeout and 10MB transaction limits. That said, I'd love feedback, especially on whether the approach makes sense for larger-scale scenarios. I don't work with query planners often and there are probably better ways to structure some of this. **Since this is landing in the datafusion-contrib organization:** - Who would be the right person(s) to review this PR? - What are the general contribution/review guidelines for datafusion-contrib repos? GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-15011862 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
