[I] [VL] TestPartitionPrunning Iceberg project and scan fallback by StaticInvoke BucketFunction and filter push down, and scan failed [incubator-gluten]

via GitHub Tue, 11 Mar 2025 09:09:11 -0700


jinchengchenghh opened a new issue, #8968:
URL: https://github.com/apache/incubator-gluten/issues/8968


   ### Description
   
   Test `TestPartitionPrunning` fallback by
   ```
   Validation failed with exception from: ProjectExecTransformer, reason: Not 
supported to map spark function name to substrait function name: 
staticinvoke(class org.apache.iceberg.spark.functions.BucketFunction$BucketInt, 
IntegerType, invoke, 3, id#0, IntegerType, IntegerType, false, true, true), 
class name: StaticInvoke.
   ```
   
   And Scan fallback by
   ```
   class org.apache.iceberg.spark.source.SparkBatchQueryScan does not support 
push down filters
   ```
   
   And if the format is parquet, throws exception
   ```
   E20250311 15:58:46.202656 18982389 Exceptions.h:66] Line: 
/Users/chengchengjin/code/gluten/ep/build-velox/build/velox_ep/velox/common/file/FileSystems.cpp:63,
 Function:getFileSystem, Expression:  No registered file system matched with 
file path 
'TestIdentityPartitionData-279550923fs:/var/folders/63/845y6pk53dx_83hpw8ztdchw0000gn/T/junit15026807393031088749/junit10667116634079860356/data/date=2020-02-03/level=DEBUG/id_bucket=0/message_trunc=debug/timestamp_hour=2020-02-03-01/00000-2-9ea4b70b-d44b-43c5-bddc-df9b7c2acfa9-0-00004.parquet',
 Source: RUNTIME, ErrorCode: INVALID_STATE
   25/03/11 15:58:46 ERROR TaskResources: Task 9 failed by error: 
   org.apache.gluten.exception.GlutenException: 
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: INVALID_STATE
   Reason: No registered file system matched with file path 
'TestIdentityPartitionData-279550923fs:/var/folders/63/845y6pk53dx_83hpw8ztdchw0000gn/T/junit15026807393031088749/junit10667116634079860356/data/date=2020-02-03/level=DEBUG/id_bucket=0/message_trunc=debug/timestamp_hour=2020-02-03-01/00000-2-9ea4b70b-d44b-43c5-bddc-df9b7c2acfa9-0-00004.parquet'
   Retriable: False
   Context: Split [Hive: 
TestIdentityPartitionData-279550923fs:/var/folders/63/845y6pk53dx_83hpw8ztdchw0000gn/T/junit15026807393031088749/junit10667116634079860356/data/date=2020-02-03/level=DEBUG/id_bucket=0/message_trunc=debug/timestamp_hour=2020-02-03-01/00000-2-9ea4b70b-d44b-43c5-bddc-df9b7c2acfa9-0-00004.parquet
 4 - 1512] Task Gluten_Stage_8_TID_9_VTID_6
   Additional Context: Operator: TableScan[0] 0
   Function: getFileSystem
   File: 
/Users/chengchengjin/code/gluten/ep/build-velox/build/velox_ep/velox/common/file/FileSystems.cpp
   Line: 63
   Stack trace:
   # 0  
   # 1  
   # 2  
   # 3  
   # 4  
   # 5  
   # 6  
   # 7  
   # 8  
   # 9  
   # 10 
   # 11 
   # 12 
   # 13 
   # 14 
   # 15 
   # 16 
   # 17 
   # 18 
   # 19 
   # 20 
   # 21 
   # 22 
   # 23 
   # 24 
   # 25 
   # 26 
   # 27 
   # 28 
   # 29 
   # 30 
   # 31 
   # 32 
   # 33 
   # 34 
   # 35 
   # 36 
   # 37 
   # 38 
   # 39 
   # 40 
   # 41 
   # 42 
   # 43 
   # 44 
   # 45 
   # 46 
   # 47 
   # 48 
   # 49 
   # 50 
   # 51 
   # 52 
   # 53 
   # 54 
   # 55 
   # 56 
   # 57 
   # 58 
   # 59 
   
        at 
org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at 
org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
        at 
org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
        at 
org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
        at 
org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at 
org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:122)
        at 
org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:78)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:856)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:856)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL] TestPartitionPrunning Iceberg project and scan fallback by StaticInvoke BucketFunction and filter push down, and scan failed [incubator-gluten]

Reply via email to