alamb commented on issue #17010: URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3144161211
> Thanks! I just skimmed through it quickly. I think it’d be interesting to run an experiment with plots comparing performance with and without the specific indexing to evaluate its impact. I’ll try to carve out some time for it—definitely excited to see what we could find! Here is a crazy specific idea: I think something that would be broadly interesting (aka get a log of clicks / likely to make Hacker news) would be a some example of fast single row lookups with Parquet -- this is a common usecase that is cited by new file formats The narrative could go something like * Parquet is often used for analytical, scan heavy queries and the built in indexing structures and writer properties are optimized for this case * However, you can combine external indexes and the appropriate writer properties (e.g 100 row pages so the overhead of decoding a page is low) to build a system that works well for single row lookups * Look we prototyped an example using DataFusion that can do single row lookups in under 1ms with the data is locally in cache -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org