zml1206 opened a new pull request, #7686:
URL: https://github.com/apache/incubator-gluten/pull/7686

   
   ## What changes were proposed in this pull request?
   
   (Fixes: \#7685)
   In production, we encountered a situation where the consumption of r2c large 
table was far greater than the performance improvement brought by native.
   for example:
   ```
   sql("set spark.gluten.sql.columnar.filescan=false")
   spark.range(100000000).toDF("id").selectExpr("concat('id_', 
round(id/1000000)) as k", "id % 10 as v")
         .write.mode("overwrite").parquet("tmp/t1")
   spark.read.parquet("tmp/t1").createOrReplaceTempView("t1")
   sql("select  k,sum(v) as v from t1 group by k").collect()
   ```
   
   The local test takes 18 seconds to enble gluten, and only 6 seconds to 
disablegluten.Therefore, I hope to fallback this through RAS.
   
   The optimization points are as follows:
   1. Increase bytesSize factor, cost = bytesSizeFactor * opCost
   2. r2c cost can be configured separately, and the default is 100. If 
sizeBytes is less than the threshold, the cost of RowToColumnarLike is ignored.
   3. Vanilla op cost is configurable, the default is 20, gluten op cost is 1
   
   
   ## How was this patch tested?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to