[jira] [Comment Edited] (HIVE-26653) Wrong results when (map) joining multiple tables on partition column

Soumyakanti Das (Jira) Thu, 09 Oct 2025 09:59:22 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-26653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028824#comment-18028824
 ]


Soumyakanti Das edited comment on HIVE-26653 at 10/9/25 4:57 PM:
-----------------------------------------------------------------

For some reason EXPLAIN VECTORIZED doesn't work - it's not present in the 
HiveParser.g as well, but I do see some tests that uses it. I will look into it 
further, but I was able to run EXPLAIN VECTORIZATION DETAIL

 

Old plan:
{noformat}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE 
DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE), Reducer 5 
(BROADCAST_EDGE)
        Reducer 3 <- Map 1 (SIMPLE_EDGE)
        Reducer 5 <- Map 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: table_a
                  Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Group By Operator
                    Group By Vectorization:
                        className: VectorGroupByOperator
                        groupByMode: HASH
                        keyExpressions: ConstantVectorExpression(val 20220731) 
-> 4:string
                        native: false
                        vectorProcessingMode: HASH
                        projectedOutputColumnNums: []
                    keys: '20220731' (type: string)
                    minReductionHashAggr: 0.96774197
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: aid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [string]
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: table_b
                  Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Group By Operator
                    Group By Vectorization:
                        className: VectorGroupByOperator
                        groupByMode: HASH
                        keyExpressions: ConstantVectorExpression(val 20220731) 
-> 4:string
                        native: false
                        vectorProcessingMode: HASH
                        projectedOutputColumnNums: []
                    keys: '20220731' (type: string)
                    minReductionHashAggr: 0.9655172
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: bid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [string]
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: string)
                    1 _col0 (type: string)
                  Map Join Vectorization:
                      bigTableKeyColumns: 0:string
                      bigTableRetainColumnNums: []
                      className: VectorMapJoinInnerStringOperator
                      native: true
                      nativeConditionsMet: hive.mapjoin.optimized.hashtable IS 
true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                      nonOuterSmallTableKeyMapping: []
                      projectedOutput: 1:string
                      smallTableValueMapping: 1:string
                      hashTableImplementationType: OPTIMIZED
                  outputColumnNames: _col1
                  input vertices:
                    1 Reducer 5
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 _col1 (type: string)
                      1 _col0 (type: string)
                    Map Join Vectorization:
                        bigTableKeyColumns: 1:string
                        bigTableRetainColumnNums: [2]
                        bigTableValueColumns: 2:string
                        bigTableValueExpressions: ConstantVectorExpression(val 
20220731) -> 2:string
                        className: VectorMapJoinInnerBigOnlyStringOperator
                        native: true
                        nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                        nonOuterSmallTableKeyMapping: []
                        projectedOutput: 2:string
                        hashTableImplementationType: OPTIMIZED
                    outputColumnNames: _col1
                    input vertices:
                      1 Reducer 3
                    Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Select Operator
                      expressions: _col1 (type: string)
                      outputColumnNames: _col0
                      Select Vectorization:
                          className: VectorSelectOperator
                          native: true
                          projectedOutputColumnNums: [2]
                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        File Sink Vectorization:
                            className: VectorFileSinkOperator
                            native: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                        table:
                            input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 3 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  null sort order: z
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkStringOperator
                      keyColumns: 0:string
                      native: true
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  null sort order: z
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkStringOperator
                      keyColumns: 0:string
                      native: true
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink{noformat}
 

New plan:
{noformat}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE 
DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 4 <- Map 3 (SIMPLE_EDGE)
        Reducer 5 <- Map 3 (SIMPLE_EDGE), Reducer 2 (BROADCAST_EDGE), Reducer 4 
(BROADCAST_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: table_b
                  filterExpr: (p_dt = '20220731') (type: boolean)
                  Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Select Operator
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        projectedOutputColumnNums: []
                    Statistics: Num rows: 29 Data size: 319 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Group By Operator
                      Group By Vectorization:
                          className: VectorGroupByOperator
                          groupByMode: HASH
                          keyExpressions: ConstantVectorExpression(val 1) -> 
4:boolean
                          native: false
                          vectorProcessingMode: HASH
                          projectedOutputColumnNums: []
                      keys: true (type: boolean)
                      minReductionHashAggr: 0.9655172
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: bid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [bigint]
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: table_a
                  filterExpr: (p_dt = '20220731') (type: boolean)
                  Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Select Operator
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        projectedOutputColumnNums: []
                    Statistics: Num rows: 31 Data size: 341 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Group By Operator
                      Group By Vectorization:
                          className: VectorGroupByOperator
                          groupByMode: HASH
                          keyExpressions: ConstantVectorExpression(val 1) -> 
4:boolean
                          native: false
                          vectorProcessingMode: HASH
                          projectedOutputColumnNums: []
                      keys: true (type: boolean)
                      minReductionHashAggr: 0.96774197
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: aid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [bigint]
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Reduce Output Operator
                    null sort order: 
                    sort order: 
                    Reduce Sink Vectorization:
                        className: VectorReduceSinkEmptyKeyOperator
                        native: true
                        nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 4 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Reduce Output Operator
                    null sort order: 
                    sort order: 
                    Reduce Sink Vectorization:
                        className: VectorReduceSinkEmptyKeyOperator
                        native: true
                        nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 
                      1 
                    Map Join Vectorization:
                        bigTableRetainColumnNums: []
                        className: VectorMapJoinInnerBigOnlyMultiKeyOperator
                        native: true
                        nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                        nonOuterSmallTableKeyMapping: []
                        hashTableImplementationType: OPTIMIZED
                    input vertices:
                      0 Reducer 2
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 
                        1 
                      Map Join Vectorization:
                          bigTableRetainColumnNums: []
                          className: VectorMapJoinInnerBigOnlyMultiKeyOperator
                          native: true
                          nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                          nonOuterSmallTableKeyMapping: []
                          hashTableImplementationType: OPTIMIZED
                      input vertices:
                        1 Reducer 4
                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: '20220731' (type: string)
                        outputColumnNames: _col0
                        Select Vectorization:
                            className: VectorSelectOperator
                            native: true
                            projectedOutputColumnNums: [1]
                            selectExpressions: ConstantVectorExpression(val 
20220731) -> 1:string
                        Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          File Sink Vectorization:
                              className: VectorFileSinkOperator
                              native: false
                          Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                          table:
                              input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink{noformat}
 

It looks like in the old plan we are using VectorMapJoinInnerStringOperator and 
VectorMapJoinInnerBigOnlyStringOperator, while in the new plan we only use 
VectorMapJoinInnerBigOnlyMultiKeyOperator


was (Author: soumyakanti.das):
For some reason EXPLAIN VECTORIZED doesn't work - it's not present in the 
HiveParser.g as well, but I do see some tests that uses it. I will look into it 
further, but I was able to run EXPLAIN VECTORIZATION DETAIL

 

Old plan:
{noformat}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE 
DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 3 (BROADCAST_EDGE), Reducer 5 
(BROADCAST_EDGE)
        Reducer 3 <- Map 1 (SIMPLE_EDGE)
        Reducer 5 <- Map 4 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: table_a
                  Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Group By Operator
                    Group By Vectorization:
                        className: VectorGroupByOperator
                        groupByMode: HASH
                        keyExpressions: ConstantVectorExpression(val 20220731) 
-> 4:string
                        native: false
                        vectorProcessingMode: HASH
                        projectedOutputColumnNums: []
                    keys: '20220731' (type: string)
                    minReductionHashAggr: 0.96774197
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: aid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [string]
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: table_b
                  Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Group By Operator
                    Group By Vectorization:
                        className: VectorGroupByOperator
                        groupByMode: HASH
                        keyExpressions: ConstantVectorExpression(val 20220731) 
-> 4:string
                        native: false
                        vectorProcessingMode: HASH
                        projectedOutputColumnNums: []
                    keys: '20220731' (type: string)
                    minReductionHashAggr: 0.9655172
                    mode: hash
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Reduce Sink Vectorization:
                          className: VectorReduceSinkStringOperator
                          keyColumns: 0:string
                          native: true
                          nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                      Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: bid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [string]
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: string)
                    1 _col0 (type: string)
                  Map Join Vectorization:
                      bigTableKeyColumns: 0:string
                      bigTableRetainColumnNums: []
                      className: VectorMapJoinInnerStringOperator
                      native: true
                      nativeConditionsMet: hive.mapjoin.optimized.hashtable IS 
true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                      nonOuterSmallTableKeyMapping: []
                      projectedOutput: 1:string
                      smallTableValueMapping: 1:string
                      hashTableImplementationType: OPTIMIZED
                  outputColumnNames: _col1
                  input vertices:
                    1 Reducer 5
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 _col1 (type: string)
                      1 _col0 (type: string)
                    Map Join Vectorization:
                        bigTableKeyColumns: 1:string
                        bigTableRetainColumnNums: [2]
                        bigTableValueColumns: 2:string
                        bigTableValueExpressions: ConstantVectorExpression(val 
20220731) -> 2:string
                        className: VectorMapJoinInnerBigOnlyStringOperator
                        native: true
                        nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                        nonOuterSmallTableKeyMapping: []
                        projectedOutput: 2:string
                        hashTableImplementationType: OPTIMIZED
                    outputColumnNames: _col1
                    input vertices:
                      1 Reducer 3
                    Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Select Operator
                      expressions: _col1 (type: string)
                      outputColumnNames: _col0
                      Select Vectorization:
                          className: VectorSelectOperator
                          native: true
                          projectedOutputColumnNums: [2]
                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        File Sink Vectorization:
                            className: VectorFileSinkOperator
                            native: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                        table:
                            input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 3 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  null sort order: z
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkStringOperator
                      keyColumns: 0:string
                      native: true
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:string
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:string
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  null sort order: z
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Reduce Sink Vectorization:
                      className: VectorReduceSinkStringOperator
                      keyColumns: 0:string
                      native: true
                      nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                  Statistics: Num rows: 1 Data size: 92 Basic stats: COMPLETE 
Column stats: COMPLETE  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink{noformat}
 

New plan:
{noformat}
PLAN VECTORIZATION:
  enabled: true
  enabledConditionsMet: [hive.vectorized.execution.enabled IS true]STAGE 
DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 4 <- Map 3 (SIMPLE_EDGE)
        Reducer 5 <- Map 3 (SIMPLE_EDGE), Reducer 2 (BROADCAST_EDGE), Reducer 4 
(BROADCAST_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: table_b
                  filterExpr: (p_dt = '20220731') (type: boolean)
                  Statistics: Num rows: 29 Data size: 319 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:bid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Select Operator
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        projectedOutputColumnNums: []
                    Statistics: Num rows: 29 Data size: 319 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Group By Operator
                      Group By Vectorization:
                          className: VectorGroupByOperator
                          groupByMode: HASH
                          keyExpressions: ConstantVectorExpression(val 1) -> 
4:boolean
                          native: false
                          vectorProcessingMode: HASH
                          projectedOutputColumnNums: []
                      keys: true (type: boolean)
                      minReductionHashAggr: 0.9655172
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: bid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [bigint]
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: table_a
                  filterExpr: (p_dt = '20220731') (type: boolean)
                  Statistics: Num rows: 31 Data size: 341 Basic stats: COMPLETE 
Column stats: COMPLETE
                  TableScan Vectorization:
                      native: true
                      vectorizationSchemaColumns: [0:aid:string, 1:p_dt:string, 
2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>, 
3:ROW__IS__DELETED:boolean]
                  Select Operator
                    Select Vectorization:
                        className: VectorSelectOperator
                        native: true
                        projectedOutputColumnNums: []
                    Statistics: Num rows: 31 Data size: 341 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Group By Operator
                      Group By Vectorization:
                          className: VectorGroupByOperator
                          groupByMode: HASH
                          keyExpressions: ConstantVectorExpression(val 1) -> 
4:boolean
                          native: false
                          vectorProcessingMode: HASH
                          projectedOutputColumnNums: []
                      keys: true (type: boolean)
                      minReductionHashAggr: 0.96774197
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: boolean)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: boolean)
                        Reduce Sink Vectorization:
                            className: VectorReduceSinkLongOperator
                            keyColumns: 0:boolean
                            native: true
                            nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: vectorized, llap
            LLAP IO: all inputs
            Map Vectorization:
                enabled: true
                enabledConditionsMet: 
hive.vectorized.use.vector.serde.deserialize IS true
                inputFormatFeatureSupport: [DECIMAL_64]
                featureSupportInUse: [DECIMAL_64]
                inputFileFormats: org.apache.hadoop.mapred.TextInputFormat
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    includeColumns: []
                    dataColumns: aid:string
                    partitionColumnCount: 1
                    partitionColumns: p_dt:string
                    scratchColumnTypeNames: [bigint]
        Reducer 2 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Reduce Output Operator
                    null sort order: 
                    sort order: 
                    Reduce Sink Vectorization:
                        className: VectorReduceSinkEmptyKeyOperator
                        native: true
                        nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 4 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Reduce Output Operator
                    null sort order: 
                    sort order: 
                    Reduce Sink Vectorization:
                        className: VectorReduceSinkEmptyKeyOperator
                        native: true
                        nativeConditionsMet: 
hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine 
tez IN [tez] IS true, No PTF TopN IS true, No DISTINCT columns IS true, 
BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Vectorization:
                enabled: true
                enableConditionsMet: hive.vectorized.execution.reduce.enabled 
IS true, hive.execution.engine tez IN [tez] IS true
                reduceColumnNullOrder: z
                reduceColumnSortOrder: +
                allNative: false
                usesVectorUDFAdaptor: false
                vectorized: true
                rowBatchContext:
                    dataColumnCount: 1
                    dataColumns: KEY._col0:boolean
                    partitionColumnCount: 0
                    scratchColumnTypeNames: []
            Reduce Operator Tree:
              Group By Operator
                Group By Vectorization:
                    className: VectorGroupByOperator
                    groupByMode: MERGEPARTIAL
                    keyExpressions: col 0:boolean
                    native: false
                    vectorProcessingMode: MERGE_PARTIAL
                    projectedOutputColumnNums: []
                keys: KEY._col0 (type: boolean)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                Select Operator
                  Select Vectorization:
                      className: VectorSelectOperator
                      native: true
                      projectedOutputColumnNums: []
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    keys:
                      0 
                      1 
                    Map Join Vectorization:
                        bigTableRetainColumnNums: []
                        className: VectorMapJoinInnerBigOnlyMultiKeyOperator
                        native: true
                        nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                        nonOuterSmallTableKeyMapping: []
                        hashTableImplementationType: OPTIMIZED
                    input vertices:
                      0 Reducer 2
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 
                        1 
                      Map Join Vectorization:
                          bigTableRetainColumnNums: []
                          className: VectorMapJoinInnerBigOnlyMultiKeyOperator
                          native: true
                          nativeConditionsMet: hive.mapjoin.optimized.hashtable 
IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, 
hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No 
nullsafe IS true, Small table vectorizes IS true, Optimized Table and Supports 
Key Types IS true
                          nonOuterSmallTableKeyMapping: []
                          hashTableImplementationType: OPTIMIZED
                      input vertices:
                        1 Reducer 4
                      Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: '20220731' (type: string)
                        outputColumnNames: _col0
                        Select Vectorization:
                            className: VectorSelectOperator
                            native: true
                            projectedOutputColumnNums: [1]
                            selectExpressions: ConstantVectorExpression(val 
20220731) -> 1:string
                        Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          File Sink Vectorization:
                              className: VectorFileSinkOperator
                              native: false
                          Statistics: Num rows: 1 Data size: 92 Basic stats: 
COMPLETE Column stats: COMPLETE
                          table:
                              input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink{noformat}

> Wrong results when (map) joining multiple tables on partition column
> --------------------------------------------------------------------
>
>                 Key: HIVE-26653
>                 URL: https://issues.apache.org/jira/browse/HIVE-26653
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>         Attachments: hive_26653.q, hive_26653_explain.txt, 
> hive_26653_explain_cbo.txt, table_a.csv, table_b.csv
>
>
> The result of the query must have exactly one row matching the date specified 
> in the WHERE clause but the query returns nothing.
> {code:sql}
> CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a;
> CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b;
> set hive.auto.convert.join=true;
> set hive.optimize.semijoin.conversion=false;
> SELECT a.p_dt
> FROM ((SELECT p_dt
>        FROM table_b
>        GROUP BY p_dt) a
>          JOIN
>      (SELECT p_dt
>       FROM table_a
>       GROUP BY p_dt) b ON a.p_dt = b.p_dt
>          JOIN
>      (SELECT p_dt
>       FROM table_a
>       GROUP BY p_dt) c ON a.p_dt = c.p_dt)
> WHERE a.p_dt =  translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), 
> '-', '');
> {code}
> +Expected result+
> 20220731
> +Actual result+
> Empty
> To reproduce the problem the tables need to have some data. Values in aid and 
> bid columns are not important. For p_dt column use one of the following 
> values 20220731, 20220630.
> I will attach some sample data with which the problem can be reproduced. The 
> tables look like below.
> ||aid|pdt||
> |611|20220731|
> |239|20220630|
> |...|...|
> The problem can be reproduced via qtest in current master 
> (commit 
> [6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a])
>  by running the TestMiniLlapLocalCliDriver.
> There is specific query plan (will attach shortly) for which the problem 
> shows up so if the plan changes slightly the problem may not appear anymore; 
> this is why we need to set explicitly hive.optimize.semijoin.conversion and 
> hive.auto.convert.join to trigger the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-26653) Wrong results when (map) joining multiple tables on partition column

Reply via email to