ngsg commented on code in PR #5717: URL: https://github.com/apache/hive/pull/5717#discussion_r2071492206
########## ql/src/test/queries/clientpositive/sharedwork_mapjoin_datasize_check.q: ########## @@ -0,0 +1,66 @@ +--! qt:dataset:src +--! qt:dataset:src1 + +set hive.auto.convert.join=true; +set hive.llap.mapjoin.memory.oversubscribe.factor=0; +set hive.auto.convert.join.noconditionaltask.size=500; + +-- The InMemoryDataSize of MapJoin is 280. Therefore, SWO should not merge 2 TSs reading src +-- as the sum of InMemoryDataSize of 2 unmerged MapJoin exceeds 500. Review Comment: Before the patch, `SharedWorkOptimizer` merges two TableScan operators, which results in fewer Map vertices in the explained plan. I also attached the Tez vertex dependency from the original qfile output for your understanding: ``` Edges: Map 1 <- Map 4 (BROADCAST_EDGE), Reducer 5 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE) Reducer 3 <- Map 1 (SIMPLE_EDGE) Reducer 5 <- Map 4 (SIMPLE_EDGE) #### A masked pattern was here #### ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org