[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110983#comment-14110983 ]
Venki Korukanti commented on HIVE-7843: --------------------------------------- Looks like the assertion is wrong here. {code} private String getDynPartDirectory(List<String> row, List<String> dpColNames, int numDynParts) { assert row.size() == numDynParts && numDynParts == dpColNames.size() : "data length is different from num of DP columns"; ... } {code} Row size always contains the values for partition columns and bucket, but numDynParts only contains the number partition columns. So it always asserts when we do dynamic partition insert into a bucketed table. Changed the assert to account for bucket, test goes past this assert but getting a new error. {code} assert numDynParts == dpColNames.size() && row.size() == numDynParts + (conf.getDpSortState().equals(DPSortState.PARTITION_BUCKET_SORTED) ? 1 : 0) : "data length is different from num of DP columns"; {code} > orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch] > ------------------------------------------------------------------------ > > Key: HIVE-7843 > URL: https://issues.apache.org/jira/browse/HIVE-7843 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: spark-branch > Reporter: Venki Korukanti > Assignee: Venki Korukanti > Labels: Spark-M1 > Fix For: spark-branch > > > {code} > java.lang.AssertionError: data length is different from num of DP columns > org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809) > org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730) > org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829) > org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502) > org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525) > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198) > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) > org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)