zhztheplayer opened a new pull request, #9230:
URL: https://github.com/apache/incubator-gluten/pull/9230
Conditionally add a `ColumnarToRowRemovalGuard` not that does nothing on top
of a
```
+- ColumnarToRow
+- FileScan parquet
```
Which is to be cached to avoid [this Spark
code](https://github.com/apache/spark/blob/9d3f937c555ccab7777c976b66da7c7229582f26/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala#L335-L351)
from removing the C2R so `FileScan` directly emits vanilla columnar batches
that `ColumnarCachedBatchSerializer` doesn't recognize. The plan will become:
```
ColumnarToRowRemovalGuard
+- ColumnarToRow
+- FileScan parquet [l_orderkey_read#128L] Batched: true, DataFilters:
[], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/tmp/spark-e732391d-d3f4-45e7-ae2e-d521d7658b01], PartitionFilters:
[], PushedFilters: [], ReadSchema: struct<l_orderkey_read:bigint>
```
So will be treated as regular row-based plan by
`ColumnarCachedBatchSerializer` then be handled with vanilla Spark batch
serializer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]