alamb commented on issue #17010:
URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3144161211

   > Thanks! I just skimmed through it quickly. I think it’d be interesting to 
run an experiment with plots comparing performance with and without the 
specific indexing to evaluate its impact. I’ll try to carve out some time for 
it—definitely excited to see what we could find!
   
   Here is a crazy specific idea: 
   
   I think something that would be broadly interesting (aka get a log of clicks 
/ likely to make Hacker news) would be a some example of fast single row 
lookups with Parquet -- this is a common usecase that is cited by new file 
formats
   
   The narrative could go something like
   * Parquet is often used for analytical, scan heavy queries and the built in 
indexing structures and writer properties are optimized for this case
   * However, you can combine external indexes and the appropriate writer 
properties (e.g 100 row pages so the overhead of decoding a page is low) to 
build a system that works well for single row lookups
   * Look we prototyped an example using DataFusion that can do single row 
lookups in under 1ms with the data is locally in cache
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to