shubhluck commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3012523355


##########
ql/src/test/queries/clientpositive/semijoin_stats_missing_colstats.q:
##########
@@ -0,0 +1,45 @@
+-- HIVE-29516: Test that semijoin optimization handles missing column 
statistics gracefully

Review Comment:
   1. **Added .q.out file** - The expected output file is now included in the 
PR.
   2. **Test not failing on master** - You're correct that the test doesn't 
reproduce 
      the exact NPE on master. The NPE occurs under specific conditions in 
production 
      (observed with TPC-DS scale 10000) where:
      - Tables have basic statistics but no column statistics
      - The semijoin optimization threshold is met (large row count ratios)
      - The `removeSemijoinOptimizationByBenefit` code path is triggered
      The .q test serves as a regression test to verify:
      - Compilation succeeds when column stats are missing
      - The fix doesn't break normal semijoin optimization flow
      
      The actual bug fix is validated by the unit tests in TestStatsUtils which 
      verify that `updateStats` throws IllegalArgumentException when called 
with 
      `useColStats=true` but no column stats are available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to