hive git commit: HIVE-20510 : Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer (Deepak Jaiswal, reviewed by Gopal Vijayarahavan, Matt Mccline, and Thejas Nair)

djaiswal Sun, 09 Sep 2018 00:09:51 -0700

Repository: hive
Updated Branches:
  refs/heads/master 99b8c370f -> 29332fbf8



HIVE-20510 : Vectorization : Support loading bucketed tables using sorted 
dynamic partition optimizer (Deepak Jaiswal, reviewed by Gopal Vijayarahavan, 
Matt Mccline, and Thejas Nair)


Project: http://git-wip-us.apache.org/repos/asf/hive/repo
Commit: http://git-wip-us.apache.org/repos/asf/hive/commit/29332fbf
Tree: http://git-wip-us.apache.org/repos/asf/hive/tree/29332fbf
Diff: http://git-wip-us.apache.org/repos/asf/hive/diff/29332fbf

Branch: refs/heads/master
Commit: 29332fbf895ebb02840b1bc43b80e58f2f9f56fd
Parents: 99b8c37
Author: Deepak Jaiswal <[email protected]>
Authored: Sun Sep 9 00:06:39 2018 -0700
Committer: Deepak Jaiswal <[email protected]>
Committed: Sun Sep 9 00:06:39 2018 -0700

----------------------------------------------------------------------
 .../insert_into_dynamic_partitions.q.out        |  10 +-
 .../insert_overwrite_dynamic_partitions.q.out   |  10 +-
 .../hadoop/hive/ql/exec/FunctionRegistry.java   |   2 +
 .../hadoop/hive/ql/exec/ReduceSinkOperator.java |   8 +-
 .../ql/exec/vector/VectorizationContext.java    |  20 +-
 .../vector/expressions/BucketNumExpression.java |  70 +++++
 .../VectorReduceSinkObjectHashOperator.java     |  85 +++---
 .../optimizer/SortedDynPartitionOptimizer.java  |  15 +-
 .../ql/udf/generic/GenericUDFBucketNumber.java  |  43 ++++
 .../dynpart_sort_opt_vectorization.q            |  28 ++
 .../dynpart_sort_optimization_acid2.q.out       |   6 +-
 .../llap/dynpart_sort_opt_vectorization.q.out   | 256 +++++++++++++++----
 .../llap/dynpart_sort_optimization.q.out        |  30 +--
 .../llap/dynpart_sort_optimization_acid.q.out   |  50 ++--
 .../results/clientpositive/show_functions.q.out |   1 +
 15 files changed, 491 insertions(+), 143 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
----------------------------------------------------------------------
diff --git 
a/itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
 
b/itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
index 74a9a56..44e4b67 100644
--- 
a/itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
+++ 
b/itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
@@ -104,7 +104,7 @@ STAGE PLANS:
                   outputColumnNames: _col0, _col1
                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Reduce Output Operator
-                    key expressions: _col1 (type: string), '_bucket_number' 
(type: string)
+                    key expressions: _col1 (type: string), _bucket_number 
(type: string)
                     null sort order: aa
                     sort order: ++
                     Map-reduce partition columns: _col1 (type: string)
@@ -156,16 +156,16 @@ STAGE PLANS:
       Needs Tagging: false
       Reduce Operator Tree:
         Select Operator
-          expressions: VALUE._col0 (type: int), KEY._col1 (type: string), 
KEY.'_bucket_number' (type: string)
-          outputColumnNames: _col0, _col1, '_bucket_number'
-          Statistics: Num rows: 1 Data size: 98 Basic stats: COMPLETE Column 
stats: COMPLETE
+          expressions: VALUE._col0 (type: int), KEY._col1 (type: string), 
KEY._bucket_number (type: string)
+          outputColumnNames: _col0, _col1, _bucket_number
+          Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column 
stats: COMPLETE
           File Output Operator
             compressed: false
             GlobalTableId: 1
             directory: ### BLOBSTORE_STAGING_PATH ###
             Dp Sort State: PARTITION_BUCKET_SORTED
             NumFilesPerFileSink: 1
-            Statistics: Num rows: 1 Data size: 98 Basic stats: COMPLETE Column 
stats: COMPLETE
+            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: COMPLETE
             Stats Publishing Key Prefix: ### BLOBSTORE_STAGING_PATH ###
             table:
                 input format: org.apache.hadoop.mapred.TextInputFormat

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
----------------------------------------------------------------------
diff --git 
a/itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
 
b/itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
index ee02c36..593d9aa 100644
--- 
a/itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
+++ 
b/itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
@@ -122,7 +122,7 @@ STAGE PLANS:
                   outputColumnNames: _col0, _col1
                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Reduce Output Operator
-                    key expressions: _col1 (type: string), '_bucket_number' 
(type: string)
+                    key expressions: _col1 (type: string), _bucket_number 
(type: string)
                     null sort order: aa
                     sort order: ++
                     Map-reduce partition columns: _col1 (type: string)
@@ -174,16 +174,16 @@ STAGE PLANS:
       Needs Tagging: false
       Reduce Operator Tree:
         Select Operator
-          expressions: VALUE._col0 (type: int), KEY._col1 (type: string), 
KEY.'_bucket_number' (type: string)
-          outputColumnNames: _col0, _col1, '_bucket_number'
-          Statistics: Num rows: 1 Data size: 98 Basic stats: COMPLETE Column 
stats: COMPLETE
+          expressions: VALUE._col0 (type: int), KEY._col1 (type: string), 
KEY._bucket_number (type: string)
+          outputColumnNames: _col0, _col1, _bucket_number
+          Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column 
stats: COMPLETE
           File Output Operator
             compressed: false
             GlobalTableId: 1
             directory: ### BLOBSTORE_STAGING_PATH ###
             Dp Sort State: PARTITION_BUCKET_SORTED
             NumFilesPerFileSink: 1
-            Statistics: Num rows: 1 Data size: 98 Basic stats: COMPLETE Column 
stats: COMPLETE
+            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
Column stats: COMPLETE
             Stats Publishing Key Prefix: ### BLOBSTORE_STAGING_PATH ###
             table:
                 input format: org.apache.hadoop.mapred.TextInputFormat

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
----------------------------------------------------------------------
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
index 8bf0a9c..3f538b3 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
@@ -514,6 +514,8 @@ public final class FunctionRegistry {
     system.registerGenericUDF("internal_interval", 
GenericUDFInternalInterval.class);
 
     system.registerGenericUDF("to_epoch_milli", GenericUDFEpochMilli.class);
+    system.registerGenericUDF("bucket_number", GenericUDFBucketNumber.class);
+
     // Generic UDTF's
     system.registerGenericUDTF("explode", GenericUDTFExplode.class);
     system.registerGenericUDTF("replicate_rows", 
GenericUDTFReplicateRows.class);

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
----------------------------------------------------------------------
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
index a2a9c84..ce0f08d 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
@@ -18,7 +18,6 @@
 
 package org.apache.hadoop.hive.ql.exec;
 
-import static 
org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionOptimizer.BUCKET_NUMBER_COL_NAME;
 import static 
org.apache.hadoop.hive.ql.plan.ReduceSinkDesc.ReducerTraits.UNIFORM;
 
 import java.io.IOException;
@@ -35,16 +34,15 @@ import org.apache.hadoop.hive.ql.CompilationOpContext;
 import org.apache.hadoop.hive.ql.io.AcidUtils;
 import org.apache.hadoop.hive.ql.io.HiveKey;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
-import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
 import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
 import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
 import org.apache.hadoop.hive.ql.plan.ReduceSinkDesc;
 import org.apache.hadoop.hive.ql.plan.TableDesc;
 import org.apache.hadoop.hive.ql.plan.api.OperatorType;
-import org.apache.hadoop.hive.serde2.ByteStream;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDFBucketNumber;
 import org.apache.hadoop.hive.serde2.SerDeException;
 import org.apache.hadoop.hive.serde2.Serializer;
-import org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe;
 import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
 import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
 import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
@@ -166,7 +164,7 @@ public class ReduceSinkOperator extends 
TerminalOperator<ReduceSinkDesc>
       keyEval = new ExprNodeEvaluator[keys.size()];
       int i = 0;
       for (ExprNodeDesc e : keys) {
-        if (e instanceof ExprNodeConstantDesc && 
(BUCKET_NUMBER_COL_NAME).equals(((ExprNodeConstantDesc)e).getValue())) {
+        if (e instanceof ExprNodeGenericFuncDesc && ((ExprNodeGenericFuncDesc) 
e).getGenericUDF() instanceof GenericUDFBucketNumber) {
           buckColIdxInKeyForSdpo = i;
         }
         keyEval[i++] = ExprNodeEvaluatorFactory.get(e);

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
----------------------------------------------------------------------
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
index 57f7c01..55d2a16 100644
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java
@@ -35,6 +35,7 @@ import java.util.regex.Pattern;
 
 import org.apache.commons.lang.ArrayUtils;
 import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.ql.optimizer.SortedDynPartitionOptimizer;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.apache.hadoop.hive.common.type.DataTypePhysicalVariation;
@@ -839,11 +840,22 @@ public class VectorizationContext {
             }
             break;
           case ALL:
-            if (LOG.isDebugEnabled()) {
-              LOG.debug("We will try to use the VectorUDFAdaptor for " + 
exprDesc.toString()
+            // Check if this is UDF for _bucket_number
+            if (expr.getGenericUDF() instanceof GenericUDFBucketNumber) {
+              if (LOG.isDebugEnabled()) {
+                LOG.debug("UDF to handle _bucket_number : Create 
BucketNumExpression");
+              }
+              int outCol = ocm.allocateOutputColumn(exprDesc.getTypeInfo());
+              ve = new BucketNumExpression(outCol);
+              ve.setInputTypeInfos(exprDesc.getTypeInfo());
+              ve.setOutputTypeInfo(exprDesc.getTypeInfo());
+            } else {
+              if (LOG.isDebugEnabled()) {
+                LOG.debug("We will try to use the VectorUDFAdaptor for " + 
exprDesc.toString()
                   + " because hive.vectorized.adaptor.usage.mode=all");
+              }
+              ve = getCustomUDFExpression(expr, mode);
             }
-            ve = getCustomUDFExpression(expr, mode);
             break;
           default:
             throw new RuntimeException("Unknown hive vector adaptor usage mode 
" +
@@ -858,7 +870,7 @@ public class VectorizationContext {
       }
     } else if (exprDesc instanceof ExprNodeConstantDesc) {
       ve = getConstantVectorExpression(((ExprNodeConstantDesc) 
exprDesc).getValue(), exprDesc.getTypeInfo(),
-          mode);
+        mode);
     } else if (exprDesc instanceof ExprNodeDynamicValueDesc) {
       ve = getDynamicValueVectorExpression((ExprNodeDynamicValueDesc) 
exprDesc, mode);
     } else if (exprDesc instanceof ExprNodeFieldDesc) {

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java
----------------------------------------------------------------------
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java
 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java
new file mode 100644
index 0000000..d8c696c
--- /dev/null
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.vector.expressions;
+
+import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
+import org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor;
+import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import java.nio.ByteBuffer;
+
+/**
+ * An expression representing _bucket_number.
+ */
+public class BucketNumExpression extends VectorExpression {
+  private static final long serialVersionUID = 1L;
+  private int rowNum = -1;
+  private int bucketNum = -1;
+
+  public BucketNumExpression(int outputColNum) {
+    super(outputColNum);
+  }
+
+  public void initBuffer(VectorizedRowBatch batch) {
+    BytesColumnVector cv = (BytesColumnVector) batch.cols[outputColumnNum];
+    cv.isRepeating = false;
+    cv.initBuffer();
+  }
+
+  public void setRowNum(final int rowNum) {
+    this.rowNum = rowNum;
+  }
+
+  public void setBucketNum(final int bucketNum) {
+    this.bucketNum = bucketNum;
+  }
+
+  @Override
+  public void evaluate(VectorizedRowBatch batch) throws HiveException {
+    BytesColumnVector cv = (BytesColumnVector) batch.cols[outputColumnNum];
+    String bucketNumStr = String.valueOf(bucketNum);
+    cv.setVal(rowNum, bucketNumStr.getBytes(), 0, bucketNumStr.length());
+  }
+
+  @Override
+  public String vectorExpressionParameters() {
+    return "col : _bucket_number";
+  }
+
+  @Override
+  public VectorExpressionDescriptor.Descriptor getDescriptor() {
+    return (new VectorExpressionDescriptor.Builder()).build();
+  }
+}

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
----------------------------------------------------------------------
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
index 5ab59c9..1a8395a 100644
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java
@@ -18,7 +18,9 @@
 
 package org.apache.hadoop.hive.ql.exec.vector.reducesink;
 
+import java.lang.reflect.Method;
 import java.util.Random;
+import java.util.function.BiFunction;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hive.ql.CompilationOpContext;
@@ -26,6 +28,7 @@ import org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow;
 import org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow;
 import org.apache.hadoop.hive.ql.exec.vector.VectorizationContext;
 import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
+import org.apache.hadoop.hive.ql.exec.vector.expressions.BucketNumExpression;
 import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
 import org.apache.hadoop.hive.ql.plan.OperatorDesc;
@@ -73,18 +76,21 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
   protected transient Output keyOutput;
   protected transient VectorSerializeRow<BinarySortableSerializeWrite> 
keyVectorSerializeRow;
 
-  private transient boolean hasBuckets;
   private transient int numBuckets;
   private transient ObjectInspector[] bucketObjectInspectors;
   private transient VectorExtractRow bucketVectorExtractRow;
   private transient Object[] bucketFieldValues;
 
-  private transient boolean isPartitioned;
   private transient ObjectInspector[] partitionObjectInspectors;
   private transient VectorExtractRow partitionVectorExtractRow;
   private transient Object[] partitionFieldValues;
   private transient Random nonPartitionRandom;
 
+  private transient BiFunction<Object[], ObjectInspector[], Integer> hashFunc;
+  private transient BucketNumExpression bucketExpr = null;
+  private transient Method buckectEvaluatorMethod;
+
+
   /** Kryo ctor. */
   protected VectorReduceSinkObjectHashOperator() {
     super();
@@ -130,6 +136,15 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
     return objectInspectors;
   }
 
+  private void evaluateBucketExpr(VectorizedRowBatch batch, int rowNum, int 
bucketNum) throws HiveException{
+    bucketExpr.setRowNum(rowNum);
+    bucketExpr.setBucketNum(bucketNum);
+    bucketExpr.evaluate(batch);
+  }
+
+  private void evaluateBucketDummy(VectorizedRowBatch batch, int rowNum, int 
bucketNum) {
+  }
+
   @Override
   protected void initializeOp(Configuration hconf) throws HiveException {
     super.initializeOp(hconf);
@@ -169,6 +184,29 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
       partitionVectorExtractRow.init(reduceSinkPartitionTypeInfos, 
reduceSinkPartitionColumnMap);
       partitionFieldValues = new Object[reduceSinkPartitionTypeInfos.length];
     }
+
+    // Set hashFunc
+    hashFunc = bucketingVersion == 2 && !vectorDesc.getIsAcidChange() ?
+      ObjectInspectorUtils::getBucketHashCode :
+      ObjectInspectorUtils::getBucketHashCodeOld;
+
+    // Set function to evaluate _bucket_number if needed.
+    try {
+      buckectEvaluatorMethod = 
this.getClass().getDeclaredMethod("evaluateBucketDummy",
+        VectorizedRowBatch.class, int.class, int.class);
+      if (reduceSinkKeyExpressions != null) {
+        for (VectorExpression ve : reduceSinkKeyExpressions) {
+          if (ve instanceof BucketNumExpression) {
+            bucketExpr = (BucketNumExpression) ve;
+            buckectEvaluatorMethod = 
this.getClass().getDeclaredMethod("evaluateBucketExpr",
+              VectorizedRowBatch.class, int.class, int.class);
+            break;
+          }
+        }
+      }
+    } catch (NoSuchMethodException e) {
+      throw new HiveException("Failed to find method to evaluate 
_bucket_number");
+    }
   }
 
   @Override
@@ -197,6 +235,10 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
       // Perform any key expressions.  Results will go into scratch columns.
       if (reduceSinkKeyExpressions != null) {
         for (VectorExpression ve : reduceSinkKeyExpressions) {
+          // Handle _bucket_number
+          if (ve instanceof BucketNumExpression) {
+            continue; // Evaluate per row
+          }
           ve.evaluate(batch);
         }
       }
@@ -227,9 +269,8 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
 
       final int size = batch.size;
 
-      // EmptyBuckets = true
-      if (isEmptyBuckets) {
-        if (isEmptyPartitions) {
+      if (isEmptyBuckets) { // EmptyBuckets = true
+        if (isEmptyPartitions) { // isEmptyPartition = true
           for (int logical = 0; logical< size; logical++) {
             final int batchIndex = (selectedInUse ? selected[logical] : 
logical);
             final int hashCode = nonPartitionRandom.nextInt();
@@ -239,25 +280,19 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
           for (int logical = 0; logical< size; logical++) {
             final int batchIndex = (selectedInUse ? selected[logical] : 
logical);
             partitionVectorExtractRow.extractRow(batch, batchIndex, 
partitionFieldValues);
-            final int hashCode = bucketingVersion == 2 && 
!vectorDesc.getIsAcidChange() ?
-                ObjectInspectorUtils.getBucketHashCode(
-                    partitionFieldValues, partitionObjectInspectors) :
-                ObjectInspectorUtils.getBucketHashCodeOld(
-                    partitionFieldValues, partitionObjectInspectors);
+            final int hashCode = hashFunc.apply(partitionFieldValues, 
partitionObjectInspectors);
             postProcess(batch, batchIndex, tag, hashCode);
           }
         }
       } else { // EmptyBuckets = false
-        if (isEmptyPartitions) {
+        if (isEmptyPartitions) { // isEmptyPartition = true
           for (int logical = 0; logical< size; logical++) {
             final int batchIndex = (selectedInUse ? selected[logical] : 
logical);
             bucketVectorExtractRow.extractRow(batch, batchIndex, 
bucketFieldValues);
-            final int bucketNum = bucketingVersion == 2 ?
-                ObjectInspectorUtils.getBucketNumber(bucketFieldValues,
-                  bucketObjectInspectors, numBuckets) :
-                ObjectInspectorUtils.getBucketNumberOld(
-                  bucketFieldValues, bucketObjectInspectors, numBuckets);
+            final int bucketNum = ObjectInspectorUtils.getBucketNumber(
+              hashFunc.apply(bucketFieldValues, bucketObjectInspectors), 
numBuckets);
             final int hashCode = nonPartitionRandom.nextInt() * 31 + bucketNum;
+            buckectEvaluatorMethod.invoke(this, batch, batchIndex, bucketNum);
             postProcess(batch, batchIndex, tag, hashCode);
           }
         } else { // isEmptyPartition = false
@@ -265,20 +300,10 @@ public class VectorReduceSinkObjectHashOperator extends 
VectorReduceSinkCommonOp
             final int batchIndex = (selectedInUse ? selected[logical] : 
logical);
             partitionVectorExtractRow.extractRow(batch, batchIndex, 
partitionFieldValues);
             bucketVectorExtractRow.extractRow(batch, batchIndex, 
bucketFieldValues);
-            final int hashCode, bucketNum;
-            if (bucketingVersion == 2 && !vectorDesc.getIsAcidChange()) {
-              bucketNum =
-                  ObjectInspectorUtils.getBucketNumber(
-                      bucketFieldValues, bucketObjectInspectors, numBuckets);
-              hashCode = ObjectInspectorUtils.getBucketHashCode(
-                  partitionFieldValues, partitionObjectInspectors) * 31 + 
bucketNum;
-            } else { // old bucketing logic
-              bucketNum =
-                  ObjectInspectorUtils.getBucketNumberOld(
-                      bucketFieldValues, bucketObjectInspectors, numBuckets);
-              hashCode = ObjectInspectorUtils.getBucketHashCodeOld(
-                  partitionFieldValues, partitionObjectInspectors) * 31 + 
bucketNum;
-            }
+            final int bucketNum = ObjectInspectorUtils.getBucketNumber(
+              hashFunc.apply(bucketFieldValues, bucketObjectInspectors), 
numBuckets);
+            final int hashCode = hashFunc.apply(partitionFieldValues, 
partitionObjectInspectors) * 31 + bucketNum;
+            buckectEvaluatorMethod.invoke(this, batch, batchIndex, bucketNum);
             postProcess(batch, batchIndex, tag, hashCode);
           }
         }

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
----------------------------------------------------------------------
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
index 51010aa..2dc2351 100644
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
@@ -31,6 +31,7 @@ import org.apache.hadoop.hive.metastore.api.FieldSchema;
 import org.apache.hadoop.hive.metastore.api.Order;
 import org.apache.hadoop.hive.ql.exec.ColumnInfo;
 import org.apache.hadoop.hive.ql.exec.FileSinkOperator;
+import org.apache.hadoop.hive.ql.exec.FunctionRegistry;
 import org.apache.hadoop.hive.ql.exec.Operator;
 import org.apache.hadoop.hive.ql.exec.OperatorFactory;
 import org.apache.hadoop.hive.ql.exec.OperatorUtils;
@@ -59,6 +60,7 @@ import org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc;
 import org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc;
 import org.apache.hadoop.hive.ql.plan.ExprNodeDesc;
 import org.apache.hadoop.hive.ql.plan.ExprNodeDescUtils;
+import org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc;
 import org.apache.hadoop.hive.ql.plan.FileSinkDesc;
 import org.apache.hadoop.hive.ql.plan.ListBucketingCtx;
 import org.apache.hadoop.hive.ql.plan.OperatorDesc;
@@ -82,7 +84,7 @@ import com.google.common.collect.Sets;
  */
 public class SortedDynPartitionOptimizer extends Transform {
 
-  public static final String BUCKET_NUMBER_COL_NAME = "_bucket_number";
+  private static final String BUCKET_NUMBER_COL_NAME = "_bucket_number";
   @Override
   public ParseContext transform(ParseContext pCtx) throws SemanticException {
 
@@ -262,8 +264,8 @@ public class SortedDynPartitionOptimizer extends Transform {
       }
       RowSchema selRS = new RowSchema(fsParent.getSchema());
       if (!bucketColumns.isEmpty()) {
-        descs.add(new ExprNodeColumnDesc(TypeInfoFactory.stringTypeInfo, 
ReduceField.KEY.toString()+".'"+BUCKET_NUMBER_COL_NAME+"'", null, false));
-        colNames.add("'"+BUCKET_NUMBER_COL_NAME+"'");
+        descs.add(new ExprNodeColumnDesc(TypeInfoFactory.stringTypeInfo, 
ReduceField.KEY.toString()+"."+BUCKET_NUMBER_COL_NAME, null, false));
+        colNames.add(BUCKET_NUMBER_COL_NAME);
         ColumnInfo ci = new ColumnInfo(BUCKET_NUMBER_COL_NAME, 
TypeInfoFactory.stringTypeInfo, selRS.getSignature().get(0).getTabAlias(), 
true, true);
         selRS.getSignature().add(ci);
         fsParent.getSchema().getSignature().add(ci);
@@ -513,9 +515,10 @@ public class SortedDynPartitionOptimizer extends Transform 
{
       // corresponding with bucket number and hence their OIs
       for (Integer idx : keyColsPosInVal) {
         if (idx < 0) {
-          ExprNodeConstantDesc bucketNumCol = new 
ExprNodeConstantDesc(TypeInfoFactory.stringTypeInfo, BUCKET_NUMBER_COL_NAME);
-          keyCols.add(bucketNumCol);
-          colExprMap.put(Utilities.ReduceField.KEY + ".'" 
+BUCKET_NUMBER_COL_NAME+"'", bucketNumCol);
+          ExprNodeDesc bucketNumColUDF = ExprNodeGenericFuncDesc.newInstance(
+            FunctionRegistry.getFunctionInfo("bucket_number").getGenericUDF(), 
new ArrayList<>());
+          keyCols.add(bucketNumColUDF);
+          colExprMap.put(Utilities.ReduceField.KEY + "." 
+BUCKET_NUMBER_COL_NAME, bucketNumColUDF);
         } else {
           keyCols.add(allCols.get(idx).clone());
         }

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java
----------------------------------------------------------------------
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java 
b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java
new file mode 100644
index 0000000..472cc85
--- /dev/null
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+@UDFType(deterministic = false)
+public class GenericUDFBucketNumber extends GenericUDF{
+  @Override
+  public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
+    return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
+  }
+
+  @Override
+  public String getDisplayString(String[] children) {
+    return "_bucket_number";
+  }
+
+  @Override
+  public Object evaluate(DeferredObject[] arguments) throws HiveException {
+    return null;
+  }
+}

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q
----------------------------------------------------------------------
diff --git 
a/ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 
b/ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q
index 435cdad..3c2918f 100644
--- a/ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q
+++ b/ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q
@@ -1,3 +1,4 @@
+--! qt:dataset:alltypesorc
 set hive.compute.query.using.stats=false;
 set hive.mapred.mode=nonstrict;
 set hive.explain.user=false;
@@ -175,3 +176,30 @@ explain select * from over1k_part_buck_sort2_orc;
 select * from over1k_part_buck_sort2_orc;
 explain select count(*) from over1k_part_buck_sort2_orc;
 select count(*) from over1k_part_buck_sort2_orc;
+
+set hive.mapred.mode=nonstrict;
+set hive.optimize.ppd=true;
+set hive.optimize.index.filter=true;
+set hive.tez.bucket.pruning=true;
+set hive.explain.user=false;
+set hive.fetch.task.conversion=none;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.vectorized.execution.reduce.enabled=true;
+set hive.exec.dynamic.partition.mode=nonstrict;
+
+create table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+(i int,si smallint)
+partitioned by (s string)
+clustered by (si) into 2 buckets
+stored as orc tblproperties ('transactional'='true');
+
+set hive.optimize.sort.dynamic.partition=true;
+explain insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10;
+
+insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10;
+
+select cint, csmallint, cstring1 from alltypesorc limit 10;
+select * from 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint;

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out
----------------------------------------------------------------------
diff --git 
a/ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out 
b/ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out
index aea7572..c192a24 100644
--- a/ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out
+++ b/ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out
@@ -35,7 +35,7 @@ STAGE PLANS:
               outputColumnNames: _col0, _col1, _col2, _col3
               Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
               Reduce Output Operator
-                key expressions: _col2 (type: string), _col3 (type: string), 
'_bucket_number' (type: string), _col1 (type: string)
+                key expressions: _col2 (type: string), _col3 (type: string), 
_bucket_number (type: string), _col1 (type: string)
                 sort order: ++++
                 Map-reduce partition columns: _col2 (type: string), _col3 
(type: string)
                 Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
@@ -43,8 +43,8 @@ STAGE PLANS:
       Execution mode: vectorized
       Reduce Operator Tree:
         Select Operator
-          expressions: VALUE._col0 (type: string), KEY._col1 (type: string), 
KEY._col2 (type: string), KEY._col3 (type: string), KEY.'_bucket_number' (type: 
string)
-          outputColumnNames: _col0, _col1, _col2, _col3, '_bucket_number'
+          expressions: VALUE._col0 (type: string), KEY._col1 (type: string), 
KEY._col2 (type: string), KEY._col3 (type: string), KEY._bucket_number (type: 
string)
+          outputColumnNames: _col0, _col1, _col2, _col3, _bucket_number
           Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE 
Column stats: NONE
           File Output Operator
             compressed: false

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out
----------------------------------------------------------------------
diff --git 
a/ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 
b/ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out
index 22f0a31..dec761e 100644
--- 
a/ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out
+++ 
b/ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out
@@ -362,7 +362,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string)
                         sort order: ++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
@@ -373,13 +373,13 @@ STAGE PLANS:
             Execution mode: vectorized, llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 11 Data size: 1342 Basic stats: COMPLETE 
Column stats: COMPLETE
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 11 Data size: 2288 Basic stats: COMPLETE 
Column stats: COMPLETE
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 11 Data size: 1342 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 11 Data size: 2288 Basic stats: 
COMPLETE Column stats: COMPLETE
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -441,7 +441,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
@@ -452,13 +452,13 @@ STAGE PLANS:
             Execution mode: vectorized, llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 11 Data size: 1342 Basic stats: COMPLETE 
Column stats: COMPLETE
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 11 Data size: 2288 Basic stats: COMPLETE 
Column stats: COMPLETE
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 11 Data size: 1342 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 11 Data size: 2288 Basic stats: 
COMPLETE Column stats: COMPLETE
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -767,7 +767,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string)
                         sort order: ++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
@@ -778,13 +778,13 @@ STAGE PLANS:
             Execution mode: vectorized, llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 11 Data size: 1342 Basic stats: COMPLETE 
Column stats: COMPLETE
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 11 Data size: 2288 Basic stats: COMPLETE 
Column stats: COMPLETE
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 11 Data size: 1342 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 11 Data size: 2288 Basic stats: 
COMPLETE Column stats: COMPLETE
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -846,7 +846,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
@@ -857,13 +857,13 @@ STAGE PLANS:
             Execution mode: vectorized, llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 11 Data size: 1342 Basic stats: COMPLETE 
Column stats: COMPLETE
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 11 Data size: 2288 Basic stats: COMPLETE 
Column stats: COMPLETE
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 11 Data size: 1342 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 11 Data size: 2288 Basic stats: 
COMPLETE Column stats: COMPLETE
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -1144,10 +1144,10 @@ Table:                  over1k_part_buck_orc
 #### A masked pattern was here ####
 Partition Parameters:           
        COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"b\":\"true\",\"f\":\"true\",\"i\":\"true\",\"si\":\"true\"}}
-       numFiles                2                   
+       numFiles                8                   
        numRows                 32                  
        rawDataSize             640                 
-       totalSize               1460                
+       totalSize               4648                
 #### A masked pattern was here ####
                 
 # Storage Information           
@@ -1183,10 +1183,10 @@ Table:                  over1k_part_buck_orc
 #### A masked pattern was here ####
 Partition Parameters:           
        COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"b\":\"true\",\"f\":\"true\",\"i\":\"true\",\"si\":\"true\"}}
-       numFiles                2                   
-       numRows                 4                   
-       rawDataSize             80                  
-       totalSize               968                 
+       numFiles                4                   
+       numRows                 6                   
+       rawDataSize             120                 
+       totalSize               2074                
 #### A masked pattern was here ####
                 
 # Storage Information           
@@ -1222,10 +1222,10 @@ Table:                  over1k_part_buck_sort_orc
 #### A masked pattern was here ####
 Partition Parameters:           
        COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"b\":\"true\",\"f\":\"true\",\"i\":\"true\",\"si\":\"true\"}}
-       numFiles                2                   
+       numFiles                8                   
        numRows                 32                  
        rawDataSize             640                 
-       totalSize               1444                
+       totalSize               4658                
 #### A masked pattern was here ####
                 
 # Storage Information           
@@ -1261,10 +1261,10 @@ Table:                  over1k_part_buck_sort_orc
 #### A masked pattern was here ####
 Partition Parameters:           
        COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"b\":\"true\",\"f\":\"true\",\"i\":\"true\",\"si\":\"true\"}}
-       numFiles                2                   
-       numRows                 4                   
-       rawDataSize             80                  
-       totalSize               978                 
+       numFiles                4                   
+       numRows                 6                   
+       rawDataSize             120                 
+       totalSize               2074                
 #### A masked pattern was here ####
                 
 # Storage Information           
@@ -1315,7 +1315,7 @@ POSTHOOK: Input: default@over1k_part_buck_orc
 POSTHOOK: Input: default@over1k_part_buck_orc@t=27
 POSTHOOK: Input: default@over1k_part_buck_orc@t=__HIVE_DEFAULT_PARTITION__
 #### A masked pattern was here ####
-34
+38
 PREHOOK: query: select count(*) from over1k_part_buck_sort_orc
 PREHOOK: type: QUERY
 PREHOOK: Input: default@over1k_part_buck_sort_orc
@@ -1328,7 +1328,7 @@ POSTHOOK: Input: default@over1k_part_buck_sort_orc
 POSTHOOK: Input: default@over1k_part_buck_sort_orc@t=27
 POSTHOOK: Input: default@over1k_part_buck_sort_orc@t=__HIVE_DEFAULT_PARTITION__
 #### A masked pattern was here ####
-34
+38
 PREHOOK: query: create table over1k_part2_orc(
            si smallint,
            i int,
@@ -2299,7 +2299,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 11 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
@@ -2310,13 +2310,13 @@ STAGE PLANS:
             Execution mode: vectorized, llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 11 Data size: 1342 Basic stats: COMPLETE 
Column stats: COMPLETE
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 11 Data size: 2288 Basic stats: COMPLETE 
Column stats: COMPLETE
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 11 Data size: 1342 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 11 Data size: 2288 Basic stats: 
COMPLETE Column stats: COMPLETE
                   table:
                       input format: org.apache.hadoop.mapred.TextInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
@@ -2640,9 +2640,9 @@ Table:                    over1k_part_buck_sort2_orc
 Partition Parameters:           
        COLUMN_STATS_ACCURATE   
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"b\":\"true\",\"f\":\"true\",\"i\":\"true\",\"si\":\"true\"}}
        numFiles                1                   
-       numRows                 2                   
-       rawDataSize             52                  
-       totalSize               27                  
+       numRows                 3                   
+       rawDataSize             78                  
+       totalSize               81                  
 #### A masked pattern was here ####
                 
 # Storage Information           
@@ -2702,6 +2702,8 @@ POSTHOOK: Input: 
default@over1k_part_buck_sort2_orc@t=__HIVE_DEFAULT_PARTITION__
 503    65628   4294967371      95.07   27
 401    65779   4294967402      97.39   27
 340    65677   4294967461      98.96   27
+409    65536   4294967490      46.97   NULL
+374    65560   4294967516      65.43   NULL
 473    65720   4294967324      80.74   NULL
 PREHOOK: query: explain select count(*) from over1k_part_buck_sort2_orc
 PREHOOK: type: QUERY
@@ -2723,9 +2725,9 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: over1k_part_buck_sort2_orc
-                  Statistics: Num rows: 18 Data size: 611 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Statistics: Num rows: 19 Data size: 645 Basic stats: 
COMPLETE Column stats: COMPLETE
                   Select Operator
-                    Statistics: Num rows: 18 Data size: 611 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    Statistics: Num rows: 19 Data size: 645 Basic stats: 
COMPLETE Column stats: COMPLETE
                     Group By Operator
                       aggregations: count()
                       mode: hash
@@ -2771,4 +2773,168 @@ POSTHOOK: Input: default@over1k_part_buck_sort2_orc
 POSTHOOK: Input: default@over1k_part_buck_sort2_orc@t=27
 POSTHOOK: Input: 
default@over1k_part_buck_sort2_orc@t=__HIVE_DEFAULT_PARTITION__
 #### A masked pattern was here ####
-17
+19
+PREHOOK: query: create table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+(i int,si smallint)
+partitioned by (s string)
+clustered by (si) into 2 buckets
+stored as orc tblproperties ('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+POSTHOOK: query: create table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+(i int,si smallint)
+partitioned by (s string)
+clustered by (si) into 2 buckets
+stored as orc tblproperties ('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+PREHOOK: query: explain insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10
+PREHOOK: type: QUERY
+POSTHOOK: query: explain insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10
+POSTHOOK: type: QUERY
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-2 depends on stages: Stage-1
+  Stage-0 depends on stages: Stage-2
+  Stage-3 depends on stages: Stage-0
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: alltypesorc
+                  Statistics: Num rows: 12288 Data size: 935846 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Select Operator
+                    expressions: cint (type: int), csmallint (type: smallint), 
cstring1 (type: string)
+                    outputColumnNames: _col0, _col1, _col2
+                    Statistics: Num rows: 12288 Data size: 935846 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    Limit
+                      Number of rows: 10
+                      Statistics: Num rows: 10 Data size: 816 Basic stats: 
COMPLETE Column stats: COMPLETE
+                      Reduce Output Operator
+                        sort order: 
+                        Statistics: Num rows: 10 Data size: 816 Basic stats: 
COMPLETE Column stats: COMPLETE
+                        TopN Hash Memory Usage: 0.1
+                        value expressions: _col0 (type: int), _col1 (type: 
smallint), _col2 (type: string)
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: vectorized, llap
+            Reduce Operator Tree:
+              Select Operator
+                expressions: VALUE._col0 (type: int), VALUE._col1 (type: 
smallint), VALUE._col2 (type: string)
+                outputColumnNames: _col0, _col1, _col2
+                Statistics: Num rows: 10 Data size: 816 Basic stats: COMPLETE 
Column stats: COMPLETE
+                Limit
+                  Number of rows: 10
+                  Statistics: Num rows: 10 Data size: 816 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Reduce Output Operator
+                    key expressions: _col2 (type: string), _bucket_number 
(type: string)
+                    sort order: ++
+                    Map-reduce partition columns: _col2 (type: string)
+                    Statistics: Num rows: 10 Data size: 816 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    value expressions: _col0 (type: int), _col1 (type: 
smallint)
+        Reducer 3 
+            Execution mode: vectorized, llap
+            Reduce Operator Tree:
+              Select Operator
+                expressions: VALUE._col0 (type: int), VALUE._col1 (type: 
smallint), KEY._col2 (type: string), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _bucket_number
+                Statistics: Num rows: 10 Data size: 2656 Basic stats: COMPLETE 
Column stats: COMPLETE
+                File Output Operator
+                  compressed: false
+                  Dp Sort State: PARTITION_BUCKET_SORTED
+                  Statistics: Num rows: 10 Data size: 2656 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  table:
+                      input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+                      output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
+                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
+                      name: 
default.addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+                  Write Type: INSERT
+
+  Stage: Stage-2
+    Dependency Collection
+
+  Stage: Stage-0
+    Move Operator
+      tables:
+          partition:
+            s 
+          replace: false
+          table:
+              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
+              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
+              name: 
default.addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+          Write Type: INSERT
+
+  Stage: Stage-3
+    Stats Work
+      Basic Stats Work:
+      Column Stats Desc:
+          Columns: i, si
+          Column Types: int, smallint
+          Table: 
default.addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+
+PREHOOK: query: insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesorc
+PREHOOK: Output: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+POSTHOOK: query: insert into table 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
partition (s)
+  select cint,csmallint, cstring1 from alltypesorc limit 10
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesorc
+POSTHOOK: Output: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint@s=cvLH6Eat2yFsyy7p
+POSTHOOK: Lineage: 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
PARTITION(s=cvLH6Eat2yFsyy7p).i SIMPLE 
[(alltypesorc)alltypesorc.FieldSchema(name:cint, type:int, comment:null), ]
+POSTHOOK: Lineage: 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint 
PARTITION(s=cvLH6Eat2yFsyy7p).si SIMPLE 
[(alltypesorc)alltypesorc.FieldSchema(name:csmallint, type:smallint, 
comment:null), ]
+PREHOOK: query: select cint, csmallint, cstring1 from alltypesorc limit 10
+PREHOOK: type: QUERY
+PREHOOK: Input: default@alltypesorc
+#### A masked pattern was here ####
+POSTHOOK: query: select cint, csmallint, cstring1 from alltypesorc limit 10
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@alltypesorc
+#### A masked pattern was here ####
+528534767      -13326  cvLH6Eat2yFsyy7p
+528534767      -4213   cvLH6Eat2yFsyy7p
+528534767      -15813  cvLH6Eat2yFsyy7p
+528534767      -9566   cvLH6Eat2yFsyy7p
+528534767      15007   cvLH6Eat2yFsyy7p
+528534767      7021    cvLH6Eat2yFsyy7p
+528534767      4963    cvLH6Eat2yFsyy7p
+528534767      -7824   cvLH6Eat2yFsyy7p
+528534767      -15431  cvLH6Eat2yFsyy7p
+528534767      -15549  cvLH6Eat2yFsyy7p
+PREHOOK: query: select * from 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+PREHOOK: type: QUERY
+PREHOOK: Input: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+PREHOOK: Input: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint@s=cvLH6Eat2yFsyy7p
+#### A masked pattern was here ####
+POSTHOOK: query: select * from 
addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+POSTHOOK: type: QUERY
+POSTHOOK: Input: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint
+POSTHOOK: Input: 
default@addcolumns_vectorization_true_disallowincompatible_true_fileformat_orc_tinyint@s=cvLH6Eat2yFsyy7p
+#### A masked pattern was here ####
+528534767      -13326  cvLH6Eat2yFsyy7p
+528534767      -9566   cvLH6Eat2yFsyy7p
+528534767      7021    cvLH6Eat2yFsyy7p
+528534767      -15549  cvLH6Eat2yFsyy7p
+528534767      -4213   cvLH6Eat2yFsyy7p
+528534767      -15813  cvLH6Eat2yFsyy7p
+528534767      15007   cvLH6Eat2yFsyy7p
+528534767      4963    cvLH6Eat2yFsyy7p
+528534767      -7824   cvLH6Eat2yFsyy7p
+528534767      -15431  cvLH6Eat2yFsyy7p

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
----------------------------------------------------------------------
diff --git 
a/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out 
b/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
index 21fc2c5..0c196be 100644
--- a/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
+++ b/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out
@@ -319,7 +319,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string)
                         sort order: ++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
@@ -330,8 +330,8 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                 File Output Operator
                   compressed: false
@@ -398,7 +398,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
@@ -409,8 +409,8 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                 File Output Operator
                   compressed: false
@@ -724,7 +724,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string)
                         sort order: ++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
@@ -735,8 +735,8 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                 File Output Operator
                   compressed: false
@@ -803,7 +803,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
@@ -814,8 +814,8 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                 File Output Operator
                   compressed: false
@@ -2256,7 +2256,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                       Reduce Output Operator
-                        key expressions: _col4 (type: tinyint), 
'_bucket_number' (type: string), _col3 (type: float)
+                        key expressions: _col4 (type: tinyint), _bucket_number 
(type: string), _col3 (type: float)
                         sort order: +++
                         Map-reduce partition columns: _col4 (type: tinyint)
                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
@@ -2267,8 +2267,8 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
+                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), KEY._col3 (type: float), KEY._col4 (type: 
tinyint), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                 File Output Operator
                   compressed: false

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out
----------------------------------------------------------------------
diff --git 
a/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out 
b/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out
index a0a5e0c..157f96a 100644
--- 
a/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out
+++ 
b/ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out
@@ -488,7 +488,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col3
                       Statistics: Num rows: 5 Data size: 2170 Basic stats: 
COMPLETE Column stats: PARTIAL
                       Reduce Output Operator
-                        key expressions: _col3 (type: string), 
'_bucket_number' (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
+                        key expressions: _col3 (type: string), _bucket_number 
(type: string), _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                         sort order: +++
                         Map-reduce partition columns: _col3 (type: string)
                         Statistics: Num rows: 5 Data size: 2170 Basic stats: 
COMPLETE Column stats: PARTIAL
@@ -498,13 +498,13 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), 'foo' (type: string), 'bar' 
(type: string), KEY._col3 (type: string), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, '_bucket_number'
-                Statistics: Num rows: 5 Data size: 1790 Basic stats: COMPLETE 
Column stats: PARTIAL
+                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), 'foo' (type: string), 'bar' 
(type: string), KEY._col3 (type: string), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _bucket_number
+                Statistics: Num rows: 5 Data size: 2220 Basic stats: COMPLETE 
Column stats: PARTIAL
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 5 Data size: 1790 Basic stats: 
COMPLETE Column stats: PARTIAL
+                  Statistics: Num rows: 5 Data size: 2220 Basic stats: 
COMPLETE Column stats: PARTIAL
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -1209,7 +1209,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col4
                       Statistics: Num rows: 5 Data size: 1740 Basic stats: 
COMPLETE Column stats: PARTIAL
                       Reduce Output Operator
-                        key expressions: '2008-04-08' (type: string), _col4 
(type: int), '_bucket_number' (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
+                        key expressions: '2008-04-08' (type: string), _col4 
(type: int), _bucket_number (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
                         sort order: ++++
                         Map-reduce partition columns: '2008-04-08' (type: 
string), _col4 (type: int)
                         Statistics: Num rows: 5 Data size: 1740 Basic stats: 
COMPLETE Column stats: PARTIAL
@@ -1219,13 +1219,13 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), 'foo' (type: string), 'bar' 
(type: string), '2008-04-08' (type: string), KEY._col4 (type: int), 
KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 5 Data size: 1360 Basic stats: COMPLETE 
Column stats: PARTIAL
+                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), 'foo' (type: string), 'bar' 
(type: string), '2008-04-08' (type: string), KEY._col4 (type: int), 
KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 5 Data size: 1790 Basic stats: COMPLETE 
Column stats: PARTIAL
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 5 Data size: 1360 Basic stats: 
COMPLETE Column stats: PARTIAL
+                  Statistics: Num rows: 5 Data size: 1790 Basic stats: 
COMPLETE Column stats: PARTIAL
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -1336,7 +1336,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col2
                       Statistics: Num rows: 5 Data size: 1320 Basic stats: 
COMPLETE Column stats: PARTIAL
                       Reduce Output Operator
-                        key expressions: _col1 (type: string), _col2 (type: 
int), '_bucket_number' (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
+                        key expressions: _col1 (type: string), _col2 (type: 
int), _bucket_number (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
                         sort order: ++++
                         Map-reduce partition columns: _col1 (type: string), 
_col2 (type: int)
                         Statistics: Num rows: 5 Data size: 1320 Basic stats: 
COMPLETE Column stats: PARTIAL
@@ -1346,13 +1346,13 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), KEY._col1 (type: string), 
KEY._col2 (type: int), KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, '_bucket_number'
-                Statistics: Num rows: 5 Data size: 1810 Basic stats: COMPLETE 
Column stats: PARTIAL
+                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), KEY._col1 (type: string), 
KEY._col2 (type: int), KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _bucket_number
+                Statistics: Num rows: 5 Data size: 2240 Basic stats: COMPLETE 
Column stats: PARTIAL
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 5 Data size: 1810 Basic stats: 
COMPLETE Column stats: PARTIAL
+                  Statistics: Num rows: 5 Data size: 2240 Basic stats: 
COMPLETE Column stats: PARTIAL
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -1535,7 +1535,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col3, _col4
                       Statistics: Num rows: 5 Data size: 2675 Basic stats: 
COMPLETE Column stats: PARTIAL
                       Reduce Output Operator
-                        key expressions: _col3 (type: string), _col4 (type: 
int), '_bucket_number' (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
+                        key expressions: _col3 (type: string), _col4 (type: 
int), _bucket_number (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
                         sort order: ++++
                         Map-reduce partition columns: _col3 (type: string), 
_col4 (type: int)
                         Statistics: Num rows: 5 Data size: 2675 Basic stats: 
COMPLETE Column stats: PARTIAL
@@ -1546,13 +1546,13 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), VALUE._col1 (type: string), 
VALUE._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int), 
KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 5 Data size: 3165 Basic stats: COMPLETE 
Column stats: PARTIAL
+                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), VALUE._col1 (type: string), 
VALUE._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int), 
KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 5 Data size: 3595 Basic stats: COMPLETE 
Column stats: PARTIAL
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 5 Data size: 3165 Basic stats: 
COMPLETE Column stats: PARTIAL
+                  Statistics: Num rows: 5 Data size: 3595 Basic stats: 
COMPLETE Column stats: PARTIAL
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
@@ -1634,7 +1634,7 @@ STAGE PLANS:
                       outputColumnNames: _col0, _col1, _col3, _col4
                       Statistics: Num rows: 5 Data size: 2675 Basic stats: 
COMPLETE Column stats: PARTIAL
                       Reduce Output Operator
-                        key expressions: _col3 (type: string), _col4 (type: 
int), '_bucket_number' (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
+                        key expressions: _col3 (type: string), _col4 (type: 
int), _bucket_number (type: string), _col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>)
                         sort order: ++++
                         Map-reduce partition columns: _col3 (type: string), 
_col4 (type: int)
                         Statistics: Num rows: 5 Data size: 2675 Basic stats: 
COMPLETE Column stats: PARTIAL
@@ -1645,13 +1645,13 @@ STAGE PLANS:
             Execution mode: llap
             Reduce Operator Tree:
               Select Operator
-                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), VALUE._col1 (type: string), 
VALUE._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int), 
KEY.'_bucket_number' (type: string)
-                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
'_bucket_number'
-                Statistics: Num rows: 5 Data size: 3165 Basic stats: COMPLETE 
Column stats: PARTIAL
+                expressions: KEY._col0 (type: 
struct<writeid:bigint,bucketid:int,rowid:bigint>), VALUE._col1 (type: string), 
VALUE._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int), 
KEY._bucket_number (type: string)
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_bucket_number
+                Statistics: Num rows: 5 Data size: 3595 Basic stats: COMPLETE 
Column stats: PARTIAL
                 File Output Operator
                   compressed: false
                   Dp Sort State: PARTITION_BUCKET_SORTED
-                  Statistics: Num rows: 5 Data size: 3165 Basic stats: 
COMPLETE Column stats: PARTIAL
+                  Statistics: Num rows: 5 Data size: 3595 Basic stats: 
COMPLETE Column stats: PARTIAL
                   table:
                       input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                       output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat

http://git-wip-us.apache.org/repos/asf/hive/blob/29332fbf/ql/src/test/results/clientpositive/show_functions.q.out
----------------------------------------------------------------------
diff --git a/ql/src/test/results/clientpositive/show_functions.q.out 
b/ql/src/test/results/clientpositive/show_functions.q.out
index 90608e2..8d41e78 100644
--- a/ql/src/test/results/clientpositive/show_functions.q.out
+++ b/ql/src/test/results/clientpositive/show_functions.q.out
@@ -39,6 +39,7 @@ between
 bin
 bloom_filter
 bround
+bucket_number
 cardinality_violation
 case
 cbrt

hive git commit: HIVE-20510 : Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer (Deepak Jaiswal, reviewed by Gopal Vijayarahavan, Matt Mccline, and Thejas Nair)

Reply via email to