zabetak commented on PR #6359:
URL: https://github.com/apache/hive/pull/6359#issuecomment-4153794528
> I was unable to find a way to distinguish between "const NULL"
ColStatistics objects and objects that truly have an "unknown NDV" without
changing the ColStatistics class or refactoring the StatEstimator interface,
and making...
@konstantinb Since the previous version of the combiner didn't bother much
about unknown NDV values, I am not too opinionated about handling the case
where countDistinct is zero. Personally, I would be fine even with something
simplistic like the following:
```java
if (stat.getCountDistint() >= 0 && result.getCountDistint() >= 0) {
result.setCountDistint(StatsUtils.safeAdd(result.getCountDistint(),
stat.getCountDistint()));
}
```
From my perspective missing NDVs is an edge case and not something that
should appear too often during optimization.
Having said that, I am not against a more elaborate solution like modifying
`ColStatistics` or the `StatsEstimator` interface as per previous commits so I
will defer the final decision to @konstantinb .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]