[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1063: Optimize count agg expr with null column statistics

GitBox Thu, 21 Oct 2021 08:13:47 -0700


matthewmturner commented on pull request #1063:
URL: https://github.com/apache/arrow-datafusion/pull/1063#issuecomment-948716424



   > Sorry for the delay. If I understand correctly, you have copied all the 
existing tests, just changing the datasource for one that has nulls in it:
   > 
   >     * this is very verbose
   > 
   >     * it isn't really testing anything new
   > 
   >     * it is not testing the code you have added
   > 
   > 
   > These optimizations only kick in in corner cases. My feeling is that they 
are only worth it if we manage to write them in a clean and well tested 
fashion, otherwise we take the benefit/risk ratio might quickly become bad 😃. 
Of course, @Dandandan might have another opinion as he opened the issue 
initially.
   
   @rdettai thank you for the feedback.  It seems I underestimated the task.  I 
will check it out more on my side - of course any specific guidance welcome as 
well :)
   
   One follow up question, on your point that these optimizations only kick in 
in corner cases.  Can you just clarify how a count or count on data with nulls 
is considered a corner case? I would have thought that would be considered a 
standard / common scenario.
   
   Thx again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1063: Optimize count agg expr with null column statistics

Reply via email to