[GitHub] [arrow-datafusion] Dandandan commented on pull request #380: Support statistics pruning for formats other than parquet

GitBox Fri, 21 May 2021 14:42:08 -0700


Dandandan commented on pull request #380:
URL: https://github.com/apache/arrow-datafusion/pull/380#issuecomment-846277480



   > > I think this is really cool, I think it would be also great to have this 
for in-memory tables.
   > 
   > I agree -- I think all that is needed is to calculate the min/max 
statistics for each partition (or maybe even record batch) though we might have 
to be careful not to slow down queries where it wouldn't help. Maybe it could 
be opt in. Or perhaps we could compute the statistics "on demand" (after we 
have created a PruningPredicate)
   
   Yes, I think we might want also look into support something like `analyze 
...` besides having an option when loading. Together with sorting of data that 
could become very interesting I think for in-memory analytics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #380: Support statistics pruning for formats other than parquet

Reply via email to