zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3026635324


##########
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##########
@@ -2030,34 +2034,41 @@ public static void updateStats(Statistics stats, long 
newNumRows,
 
     if (useColStats) {
       List<ColStatistics> colStats = stats.getColumnStats();
-      for (ColStatistics cs : colStats) {
-        long oldDV = cs.getCountDistint();
-        if (affectedColumns.contains(cs.getColumnName())) {
-          long newDV = oldDV;
-
-          // if ratio is greater than 1, then number of rows increases. This 
can happen
-          // when some operators like GROUPBY duplicates the input rows in 
which case
-          // number of distincts should not change. Update the distinct count 
only when
-          // the output number of rows is less than input number of rows.
-          if (ratio <= 1.0) {
-            newDV = (long) Math.ceil(ratio * oldDV);
+      if (colStats != null && !colStats.isEmpty()) {

Review Comment:
   As mentioned elsewhere let's avoid the changes in StatsUtils altogether and 
just focus on fixing the prob in `removeSemijoinOptimizationByBenefit`. We can 
follow-up with other enhancements afterwards if its really necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to