Re: [PR] feat: Native columnar to row conversion [datafusion-comet]

via GitHub Thu, 22 Jan 2026 18:20:24 -0800


wForget commented on code in PR #3221:
URL: https://github.com/apache/datafusion-comet/pull/3221#discussion_r2719336356



##########
spark/src/main/scala/org/apache/spark/sql/comet/CometNativeColumnarToRowExec.scala:
##########
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.comet
+
+import org.apache.spark.TaskContext
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution.{ColumnarToRowTransition, SparkPlan}
+import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.Utils
+
+import org.apache.comet.{CometConf, NativeColumnarToRowConverter}
+
+/**
+ * Native implementation of ColumnarToRowExec that converts Arrow columnar 
data to Spark UnsafeRow
+ * format using Rust.
+ *
+ * This is an experimental feature that can be enabled by setting
+ * `spark.comet.columnarToRow.native.enabled=true`.
+ *
+ * Benefits over the JVM implementation:
+ *   - Zero-copy for variable-length types (strings, binary)
+ *   - Better CPU cache utilization through vectorized processing
+ *   - Reduced GC pressure
+ *
+ * @param child
+ *   The child plan that produces columnar batches
+ */
+case class CometNativeColumnarToRowExec(child: SparkPlan)
+    extends ColumnarToRowTransition
+    with CometPlan {
+
+  // supportsColumnar requires to be only called on driver side, see also 
SPARK-37779.
+  assert(Utils.isInRunningSparkTask || child.supportsColumnar)
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  override lazy val metrics: Map[String, SQLMetric] = Map(
+    "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output 
rows"),
+    "numInputBatches" -> SQLMetrics.createMetric(sparkContext, "number of 
input batches"),
+    "convertTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "time in 
conversion"))
+
+  override def doExecute(): RDD[InternalRow] = {
+    val numOutputRows = longMetric("numOutputRows")
+    val numInputBatches = longMetric("numInputBatches")
+    val convertTime = longMetric("convertTime")
+
+    // Get the schema and batch size for native conversion
+    val localSchema = child.schema
+    val batchSize = CometConf.COMET_BATCH_SIZE.get()
+
+    child.executeColumnar().mapPartitionsInternal { batches =>
+      // Create native converter for this partition
+      val converter = new NativeColumnarToRowConverter(localSchema, batchSize)
+
+      // Register cleanup on task completion
+      TaskContext.get().addTaskCompletionListener[Unit] { _ =>
+        converter.close()
+      }
+
+      batches.flatMap { batch =>
+        numInputBatches += 1
+        val numRows = batch.numRows()
+        numOutputRows += numRows
+
+        val startTime = System.nanoTime()
+        val result = converter.convert(batch)

Review Comment:
   Does the columnBatch need to be closed after converting?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Native columnar to row conversion [datafusion-comet]

Reply via email to