[I] deltalake partitioned write still hass a C2R converter [incubator-gluten]

via GitHub Tue, 27 Jan 2026 21:54:20 -0800


FelixYBW opened a new issue, #11503:
URL: https://github.com/apache/incubator-gluten/issues/11503


   ### Description
   
   there still is a C2R in the parquet write stage when we create partitioned 
table like
   
   ```python
   df.write.mode("overwrite").format("delta").option("compression", 
"zstd").partitionBy(tbl_filenum[tbl]['part']).option("optimizeWrite", 
"True").save(f"s3a://presto-workload/{databasename}/{tbl}")
   ```
   
   <img width="663" height="362" alt="Image" 
src="https://github.com/user-attachments/assets/82a6080f-61aa-4a48-a897-2c6f9fb73a20";
 />
   
   saw in call stack:
   ```
   
app//org.apache.gluten.vectorized.NativeColumnarToRowJniWrapper.nativeColumnarToRowConvert(Native
 Method)
   
app//org.apache.gluten.execution.VeloxColumnarToRowExec$Converter$$anon$2.next(VeloxColumnarToRowExec.scala:193)
   
app//org.apache.gluten.execution.VeloxColumnarToRowExec$Converter$$anon$2.next(VeloxColumnarToRowExec.scala:169)
   app//scala.collection.Iterator.foreach(Iterator.scala:943)
   app//scala.collection.Iterator.foreach$(Iterator.scala:943)
   
app//org.apache.gluten.execution.VeloxColumnarToRowExec$Converter$$anon$2.foreach(VeloxColumnarToRowExec.scala:169)
   
app//org.apache.spark.sql.delta.stats.GlutenDeltaJobStatisticsTracker$GlutenDeltaTaskStatisticsTracker.newRow(GlutenDeltaWriteJobStatsTracker.scala:73)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$ColumnarDynamicPartitionDataSingleWriter.$anonfun$write$11(GlutenDeltaFileFormatWriter.scala:596)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$ColumnarDynamicPartitionDataSingleWriter.$anonfun$write$11$adapted(GlutenDeltaFileFormatWriter.scala:595)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$ColumnarDynamicPartitionDataSingleWriter$$Lambda$2037/0x00000008018bc318.apply(Unknown
 Source)
   app//scala.collection.immutable.List.foreach(List.scala:431)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$ColumnarDynamicPartitionDataSingleWriter.write(GlutenDeltaFileFormatWriter.scala:595)
   
app//org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
   
app//org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.$anonfun$executeTask$1(GlutenDeltaFileFormatWriter.scala:478)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$$$Lambda$1916/0x00000008018877f0.apply(Unknown
 Source)
   
app//org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.executeTask(GlutenDeltaFileFormatWriter.scala:485)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$.$anonfun$executeWrite$4(GlutenDeltaFileFormatWriter.scala:313)
   
app//org.apache.spark.sql.delta.files.GlutenDeltaFileFormatWriter$$$Lambda$1553/0x00000008016bd7f0.apply(Unknown
 Source)
   app//org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
   app//org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
   app//org.apache.spark.scheduler.Task.run(Task.scala:141)
   
app//org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
   
app//org.apache.spark.executor.Executor$TaskRunner$$Lambda$936/0x00000008012177d8.apply(Unknown
 Source)
   
app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
   
app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
   app//org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
   app//org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
   
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   [email protected]/java.lang.Thread.run(Thread.java:833)
   ```
   
   ### Gluten version
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] deltalake partitioned write still hass a C2R converter [incubator-gluten]

Reply via email to