[PR] [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache [incubator-gluten]

via GitHub Fri, 04 Apr 2025 08:08:39 -0700


zhztheplayer opened a new pull request, #9230:
URL: https://github.com/apache/incubator-gluten/pull/9230


   Conditionally add a `ColumnarToRowRemovalGuard` not that does nothing on top 
of a 
   
   ```
   +- ColumnarToRow
      +- FileScan parquet
   ```
   
   Which is to be cached to avoid [this Spark 
code](https://github.com/apache/spark/blob/9d3f937c555ccab7777c976b66da7c7229582f26/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala#L335-L351)
 from removing the C2R so `FileScan` directly emits vanilla columnar batches 
that `ColumnarCachedBatchSerializer` doesn't recognize. The plan will become:
   
   ```
   ColumnarToRowRemovalGuard
   +- ColumnarToRow
      +- FileScan parquet [l_orderkey_read#128L] Batched: true, DataFilters: 
[], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/tmp/spark-e732391d-d3f4-45e7-ae2e-d521d7658b01], PartitionFilters: 
[], PushedFilters: [], ReadSchema: struct<l_orderkey_read:bigint>
   ```
   
   So will be treated as regular row-based plan by 
`ColumnarCachedBatchSerializer` then be handled with vanilla Spark batch 
serializer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [GLUTEN-8497][VL] Fix columnar batch type mismatch in table cache [incubator-gluten]

Reply via email to