[I] [VL]insert into hive partition table:An unsupported nested encoding was found [incubator-gluten]

via GitHub Thu, 21 Nov 2024 22:41:17 -0800


liuchunhua opened a new issue, #8024:
URL: https://github.com/apache/incubator-gluten/issues/8024


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   ``` sql
   create table bigdata.tmp_test(
   bz date,
   amt double
   )
   partitioned by (`bz`);
   
   create table bigdata.tmp_test2(
   bz date,
   amt double
   );
   
   
   insert into bigdata.tmp_test2
   select   /*+ REPARTITION(1) */ * 
   from  values
   (date'2024-11-20', cast(null as double)), 
   (date'2024-11-21', cast(null as double)),
   (date'2024-11-20', cast(null as double)), 
   (date'2024-11-21', cast(null as double)) as tab(`bz`, amt);
   
   
   insert into bigdata.tmp_test(`bz`, `amt`) 
   select `bz`, `amt` from  bigdata.tmp_test2;
   ```
   error:
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 
7) (emr-worker-6.cluster-193258 executor 3): org.apache.spark.SparkException: 
Task failed while writing rows to staging path: 
hdfs://emr-cluster/user/hive/warehouse/bigdata.db/tmp_test/_temporary/0/_temporary/attempt_202411221316176936724271216282837_0004_m_000000_7,
 output path: hdfs://emr-cluster/user/hive/warehouse/bigdata.db/tmp_test
        at 
org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.compute(VeloxColumnarWriteFilesExec.scala:234)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: An unsupported nested encoding was found.
   Retriable: False
   Expression: vec.valueVector() == nullptr || 
vec.wrappedVector()->isFlatEncoding()
   Context: Operator: TableWrite[2] 2
   Function: exportFlattenedVector
   File: /opt/gluten/ep/build-velox/build/velox_ep/velox/vector/arrow/Bridge.cpp
   Line: 1081
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  
_ZN8facebook5velox12_GLOBAL__N_121exportFlattenedVectorERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolERNS1_24VeloxToArrowBridgeHolderE
   # 4  
_ZN8facebook5velox12_GLOBAL__N_117exportToArrowImplERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolE
   # 5  
_ZN8facebook5velox12_GLOBAL__N_117exportToArrowImplERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolE
   # 6  
_ZN8facebook5velox13exportToArrowERKSt10shared_ptrINS0_10BaseVectorEER10ArrowArrayPNS0_6memory10MemoryPoolERK12ArrowOptions
   # 7  
_ZN8facebook5velox7parquet6Writer5writeERKSt10shared_ptrINS0_10BaseVectorEE
   # 8  
_ZN8facebook5velox9connector4hive12HiveDataSink5writeEmSt10shared_ptrINS0_9RowVectorEE
   # 9  
_ZN8facebook5velox9connector4hive12HiveDataSink10appendDataESt10shared_ptrINS0_9RowVectorEE
   # 10 
_ZN8facebook5velox4exec11TableWriter8addInputESt10shared_ptrINS0_9RowVectorEE
   # 11 
_ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE4_clEv
   # 12 
_ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
   # 13 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 14 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
   # 15 _ZN6gluten24WholeStageResultIterator4nextEv
   # 16 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
   # 17 0x00007f0919018427
   
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at 
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
        at 
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
        at 
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at 
org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.$anonfun$compute$2(VeloxColumnarWriteFilesExec.scala:204)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1563)
        at 
org.apache.spark.sql.execution.VeloxColumnarWriteFilesRDD.compute(VeloxColumnarWriteFilesExec.scala:199)
        ... 11 more
   ```
   
   ### Spark version
   
   Spark-3.4.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   Backend: Velox
   Backend Branch: HEAD
   Backend Revision: c1c40d997551d83ccaaaa9c2e26cabbc949a425a
   Backend Revision Time: 2024-11-15 21:00:05 +0000
   GCC Version: GCC: (GNU) 11.2.1 20220127 (Red Hat 11.2.1-9)
   Gluten Branch: 
   Gluten Build Time: 2024-11-17T05:43:21Z
   Gluten Repo URL: 
   Gluten Revision: 
   Gluten Revision Time: 
   Gluten Version: 1.3.0-SNAPSHOT
   Hadoop Version: 2.7.4
   Java Version: 1.8
   Scala Version: 2.12.15
   Spark Version: 3.4.4
   
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL]insert into hive partition table:An unsupported nested encoding was found [incubator-gluten]

Reply via email to