Re: [PR] [DOC] Adding doc for Partial Projection [incubator-gluten]

via GitHub Mon, 07 Jul 2025 03:09:38 -0700


wForget commented on code in PR #10135:
URL: 
https://github.com/apache/incubator-gluten/pull/10135#discussion_r2189581716



##########
docs/developers/PartialProject.md:
##########
@@ -0,0 +1,20 @@
+# Partial Projection Support
+
+In Gluten, there is still a gap in supporting all Spark expressions natively 
(e.g., some json functions or Java UDFs). Sometimes, Gluten will choose the JVM 
code path to run the code, which can introduce performance regressions. Partial 
projections were added to improve performance in these cases.
+
+
+## Detailed Implementations
+
+### Adding Partial Projection for UDF
+
+For example, with the expression hash(udf(col0)), col1, col2, col3, col4, 
partial projection allows us to convert only col0 to row or column to Arrow as 
input, and convert udf(col0) as an alias partialProject1_. Then, 
ProjectExecTransformer will handle hash(partialProject1_), col1, col2, col3, 
col4, partialProject1_. This feature saves the cost of converting the columnar 
format to row format and vice-versa.
+
+
+## Adding Partial Projection for Unsupported Expressions

Review Comment:
   Do we need to mention `spark.gluten.expression.blacklist`? We can use this 
configuration to fallback to vanilla spark when some expressions have 
inconsistent behaviors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [DOC] Adding doc for Partial Projection [incubator-gluten]

Reply via email to