zabetak commented on code in PR #5196: URL: https://github.com/apache/hive/pull/5196#discussion_r1908879493
########## ql/src/test/queries/clientpositive/sharedwork.q: ########## @@ -20,6 +20,24 @@ create table MY_TABLE_0001_01 ( col_1 string, col_100 string); +explain cbo SELECT + Table__323.col_7, + CAST(Table__323.col_3 AS DATE) col_3, + Table__323.col_20, + Table__1232.col_21 col_21_1232, + Table__323.col_1, + Table__133.col_22, + Table__879.col_21 col_21_879 + ,Table__133.col_23 +FROM MY_TABLE_0001 Table__323 +LEFT OUTER JOIN MY_TABLE_0003 Table__1232 ON (Table__323.col_20=Table__1232.col_24) +LEFT OUTER JOIN MY_TABLE_0001_00 Table__133 ON (Table__323.col_1=Table__133.col_1) +LEFT OUTER JOIN MY_TABLE_0003 Table__879 ON (Table__133.col_22=Table__879.col_24) +LEFT OUTER JOIN MY_TABLE_0001_01 Table__1215 ON (Table__323.col_1=Table__1215.col_1 and Table__1215.col_100 = 210) +WHERE 1=1 +AND (cast(Table__323.col_7 AS DOUBLE) IS NOT NULL OR Table__323.col_7 IS NULL) +AND CAST(Table__323.col_3 AS DATE) BETWEEN '2018-07-01' AND '2019-01-23' +AND Table__323.col_20 IN ('part1','part2','part3'); Review Comment: It's unclear what is the purpose of the new EXPLAIN CBO statements. If they are just for debugging purposes they should be removed. If they are supposed to guard against some regression then they should be moved into another .q file with a descriptive name that explains their purpose. ########## ql/src/test/results/clientpositive/llap/explainuser_4.q.out: ########## @@ -119,24 +119,24 @@ Stage-0 PARTITION_ONLY_SHUFFLE [RS_11] Group By Operator [GBY_10] (rows=1 width=8) Output:["_col0"],aggregations:["count()"] - Merge Join Operator [MERGEJOIN_29] (rows=4626 width=8) + Merge Join Operator [MERGEJOIN_29] (rows=3449 width=8) Conds:RS_32._col0=RS_35._col0(Inner) <-Map 1 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_32] PartitionCols:_col0 - Select Operator [SEL_31] (rows=3078 width=2) + Select Operator [SEL_31] (rows=2297 width=2) Output:["_col0"] - Filter Operator [FIL_30] (rows=3078 width=2) - predicate:cint BETWEEN 1000000 AND 3000000 + Filter Operator [FIL_30] (rows=2297 width=2) + predicate:(cint is not null and cint BETWEEN 1000000 AND 3000000) TableScan [TS_0] (rows=12288 width=2) default@alltypesorc,a,Tbl:COMPLETE,Col:COMPLETE,Output:["cint"] <-Map 4 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_35] PartitionCols:_col0 - Select Operator [SEL_34] (rows=2298 width=2) + Select Operator [SEL_34] (rows=1715 width=2) Output:["_col0"] - Filter Operator [FIL_33] (rows=2298 width=8) - predicate:(cint BETWEEN 1000000 AND 3000000 and cbigint is not null) + Filter Operator [FIL_33] (rows=1715 width=8) + predicate:(cint is not null and cint BETWEEN 1000000 AND 3000000 and cbigint is not null) Review Comment: It is interesting that the seemingly redundant `IS NOT NULL` predicate appears alongside the `BETWEEN` clause. It’s probably not gonna harm the performance much but it would be nice to see if it is something caused by Hive changes or due to Calcite upgrade. I assume that it has something to do with the SEARCH operator. @kasakrisz Is this the same redundancy that you have been investigating? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org