Re: [PR] perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface [datafusion-comet]

via GitHub Thu, 11 Jun 2026 07:48:09 -0700


mbutrovich commented on code in PR #4572:
URL: https://github.com/apache/datafusion-comet/pull/4572#discussion_r3396869909



##########
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala:
##########
@@ -3940,6 +3940,45 @@ class CometExecSuite extends CometTestBase {
     }
   }
 
+  test("CometLocalTableScanExec falls back when schema contains TimeType") {
+    assume(
+      org.apache.comet.CometSparkSessionExtensions.isSpark41Plus,
+      "TimeType requires Spark 4.1+")
+    // spark.sql.timeType.enabled defaults to Utils.isTesting; enable 
explicitly so the
+    // row encoder accepts TIME (matches Spark's own TimeFunctionsSuiteBase 
setup).
+    withSQLConf(
+      "spark.sql.timeType.enabled" -> "true",
+      CometConf.COMET_EXEC_LOCAL_TABLE_SCAN_ENABLED.key -> "true") {
+      // VALUES folds to a LocalRelation, exercising the 
CometLocalTableScanExec convert
+      // path; the TimeType column should drive the schema-level fallback.
+      val df = spark.sql("SELECT * FROM VALUES (TIME '12:34:56'), (TIME 
'01:02:03') AS t(c)")
+      checkSparkAnswer(df)
+    }
+  }
+
+  test("CometLocalTableScanExec does not leak Arrow buffers (project 
consumer)") {

Review Comment:
   It actually gets exercised by fuzz suite. I found it originally in this PR, 
which this change was peeled off from: 
https://github.com/apache/datafusion-comet/pull/4393
   
   > Decimal128 alignment panic on JVM-Arrow imports
   > 
   > CometFuzzTestSuite "join (jvm shuffle, nativeC2R=true)" panics with Memory 
pointer from external source (e.g, FFI) is not aligned with the specified 
scalar type when a batch contains array<decimal(10,2)>. Since Rust 1.77 / LLVM 
18, align_of::<i128>() == 16 on every platform we run on, so arrow-rs's 
ScalarBuffer::<i128>::from(Buffer) panics on Decimal128 buffers landing on 
offsets that are 8-byte but not 16-byte aligned. The Arrow C Data Interface 
only recommends 8-byte alignment, and arrow-java's VectorUnloader and 
NettyAllocationManager only guarantee 8-byte. The producer is spec-conformant, 
the consumer-side check is unilateral. Stock 
arrow::ffi_stream::ArrowArrayStreamReader::next does not call 
ArrayData::align_buffers() before constructing typed arrays, so it panics 
before the caller ever sees a RecordBatch. The old bespoke import path called 
align_buffers() explicitly in from_spark; the canonical reader switch dropped 
that step.
   > 
   > Comet-side fix: AlignedArrowStreamReader, a 70-line replacement that 
drives FFI_ArrowArrayStream::get_next directly, runs from_ffi_and_data_type, 
calls align_buffers(), then builds the RecordBatch. Once 
https://github.com/apache/arrow-rs/pull/10030 lands (calls align_buffers() 
inside from_ffi itself, mirroring the IPC reader's default and matching the 
existing arrow-pyarrow workaround), AlignedArrowStreamReader can be deleted in 
favor of ArrowArrayStreamReader. Tracking issue: 
https://github.com/apache/arrow-rs/issues/10028.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: replace CometBatchIterator FFI input path with the Arrow C Stream Interface [datafusion-comet]

Reply via email to