[PR] [GLUTEN-9025] Remove the ColumnarPartialProject when its followers don't support columnar [incubator-gluten]

via GitHub Mon, 17 Mar 2025 02:04:12 -0700


weixiuli opened a new pull request, #9026:
URL: https://github.com/apache/incubator-gluten/pull/9026


   
   ## What changes were proposed in this pull request?
   When columnpartialproject's followers don't support columnar, we should 
remove it to avoid additional C2R and R2C.
   
   (Fixes: \#9025)
   
   ## How was this patch tested?
   
   we set spark.gluten.sql.native.writer.enabled=true (only for testing the 
follower of UDF project doesn't support columnar )and run the SQL:
   
   ```
   insert overwrite t1
   select (plus_one(l_extendedprice) * l_discount
    + hash(l_orderkey) + hash(l_comment)) as revenue
    from   lineitem
   
   ```
   
   Before this PR :
   
   ```
   +- Execute InsertIntoHadoopFsRelationCommand 
file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
 false, Parquet, 
[path=file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
 Overwrite, `spark_catalog`.`default`.`t1`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
 [revenue]
            +- WriteFiles
               +- VeloxColumnarToRow
                  +- ^(2) ProjectExecTransformer 
[cast((((_SparkPartialProject0#64 * l_discount#6) + cast(hash(l_orderkey#0L, 
42) as decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as 
double) AS revenue#53]
                     +- ^(2) InputIteratorTransformer[l_orderkey#0L, 
l_extendedprice#5, l_discount#6, l_comment#15, _SparkPartialProject0#64]
                        +- ColumnarPartialProject Project [cast((((if 
(isnull(cast(l_extendedprice#5 as bigint))) cast(null as decimal(20,0)) else 
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as 
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as 
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS 
revenue#53] PartialProject List(if (isnull(cast(l_extendedprice#5 as bigint))) 
cast(null as decimal(20,0)) else 
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as 
decimal(20,0)) AS _SparkPartialProject0#64)
                           +- ^(1) BatchScanTransformer parquet 
file:/root/git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
 l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [], 
Format: parquet, Location: InMemoryFileIndex(1 
paths)[file:/root/git-gluten/backends-velox/target/scala-2.12/t..., 
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: 
[], ReadSchema: 
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
 RuntimeFilters: [] NativeFilters: []
   ```
   
   After this PR: 
   ```
    +- Execute InsertIntoHadoopFsRelationCommand 
file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1,
 false, Parquet, 
[path=file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1],
 Overwrite, `spark_catalog`.`default`.`t1`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/root/git-gluten/spark-warehouse/org.apache.gluten.expression.UDFPartialProjectSuiteRasOn/t1),
 [revenue]
         +- WriteFiles
            +- *(1) Project [cast((((if (isnull(cast(l_extendedprice#5 as 
bigint))) cast(null as decimal(20,0)) else 
cast(plus_one(knownnotnull(cast(l_extendedprice#5 as bigint))) as 
decimal(20,0)) * l_discount#6) + cast(hash(l_orderkey#0L, 42) as 
decimal(10,0))) + cast(hash(l_comment#15, 42) as decimal(10,0))) as double) AS 
revenue#83]
               +- VeloxColumnarToRow
                  +- ^(3) BatchScanTransformer parquet 
file:/root/git-gluten/backends-velox/target/scala-2.12/test-classes/tpch-data-parquet/lineitem[l_orderkey#0L,
 l_extendedprice#5, l_discount#6, l_comment#15] ParquetScan DataFilters: [], 
Format: parquet, Location: InMemoryFileIndex(1 
paths)[file:/root/git-gluten/backends-velox/target/scala-2.12/t..., 
PartitionFilters: [], PushedAggregation: [], PushedFilters: [], PushedGroupBy: 
[], ReadSchema: 
struct<l_orderkey:bigint,l_extendedprice:decimal(12,2),l_discount:decimal(12,2),l_comment:string>
 RuntimeFilters: [] NativeFilters: []
   ```
   
   
   Add unittests and existing unittests
   
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [GLUTEN-9025] Remove the ColumnarPartialProject when its followers don't support columnar [incubator-gluten]

Reply via email to