weixiuli opened a new pull request, #9026:
URL: https://github.com/apache/incubator-gluten/pull/9026
## What changes were proposed in this pull request?
When columnpartialproject's followers don't support columnar, we should
remove it to avoid additional C2R and R2C.
(Fixes: \#9025)
## How was this patch tested?
we set spark.gluten.sql.native.writer.enabled=true (only for testing the
follower of UDF project doesn't support columnar )and run the SQL:
```
insert overwrite t1
select (plus_one(l_extendedprice) * l_discount
+ hash(l_orderkey) + hash(l_comment)) as revenue
from lineitem
```
Before this PR :
```
+- Execute InsertIntoHadoopFsRelationCommand
file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
false, Parquet,
[path=file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
Overwrite, `spark_catalog`.`default`.`t1`,
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
[revenue]
+- WriteFiles
+- VeloxColumnarToRow
+- ^(2) ProjectExecTransformer
[cast((((_SparkPartialProject0#64 * l_discount#6) + cast(hash(l_orderkey#0L,
42) as decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as
double) AS revenue#53]
+- ^(2) InputIteratorTransformer[l_orderkey#0L,
l_extendedprice#5, l_discount#6, l_comment#15, _SparkPartialProject0#64]
+- ColumnarPartialProject Project [cast((((if
(isnull(cast(l_extendedprice#5 as bigint))) cast(null as decimal(20,0)) else
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS
revenue#53] PartialProject List(if (isnull(cast(l_extendedprice#5 as bigint)))
cast(null as decimal(20,0)) else
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as
decimal(20,0)) AS _SparkPartialProject0#64)
+- ^(1) BatchScanTransformer parquet
file:/root/git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [],
Format: parquet, Location: InMemoryFileIndex(1
paths)[file:/root/git-gluten/backends-velox/target/scala-2.12/t...,
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy:
[], ReadSchema:
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
RuntimeFilters: [] NativeFilters: []
```
After this PR:
```
+- Execute InsertIntoHadoopFsRelationCommand
file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
false, Parquet,
[path=file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
Overwrite, `spark_catalog`.`default`.`t1`,
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
[revenue]
+- WriteFiles
+- *(1) Project [cast((((if (isnull(cast(l_extendedprice#5 as
bigint))) cast(null as decimal(20,0)) else
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS
revenue#83]
+- VeloxColumnarToRow
+- ^(3) BatchScanTransformer parquet
file:/root/git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [],
Format: parquet, Location: InMemoryFileIndex(1
paths)[file:/root/git-gluten/backends-velox/target/scala-2.12/t...,
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy:
[], ReadSchema:
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
RuntimeFilters: [] NativeFilters: []
```
Add unittests and existing unittests
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]