zabetak commented on PR #6359:
URL: https://github.com/apache/hive/pull/6359#issuecomment-4153794528

   > I was unable to find a way to distinguish between "const NULL" 
ColStatistics objects and objects that truly have an "unknown NDV" without 
changing the ColStatistics class or refactoring the StatEstimator interface, 
and making...
   
   @konstantinb Since the previous version of the combiner didn't bother much 
about unknown NDV values, I am not too opinionated about handling the case 
where countDistinct is zero. Personally, I would be fine even with something 
simplistic like the following:
   
   ```java
       if (stat.getCountDistint() >= 0 && result.getCountDistint() >= 0) {
         result.setCountDistint(StatsUtils.safeAdd(result.getCountDistint(), 
stat.getCountDistint()));
       }
   ```
   From my perspective missing NDVs is an edge case and not something that 
should appear too often during optimization.
   
   Having said that, I am not against a more elaborate solution like modifying 
`ColStatistics` or the `StatsEstimator` interface as per previous commits so I 
will defer the final decision to @konstantinb .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to