wForget commented on code in PR #10135: URL: https://github.com/apache/incubator-gluten/pull/10135#discussion_r2189581716
########## docs/developers/PartialProject.md: ########## @@ -0,0 +1,20 @@ +# Partial Projection Support + +In Gluten, there is still a gap in supporting all Spark expressions natively (e.g., some json functions or Java UDFs). Sometimes, Gluten will choose the JVM code path to run the code, which can introduce performance regressions. Partial projections were added to improve performance in these cases. + + +## Detailed Implementations + +### Adding Partial Projection for UDF + +For example, with the expression hash(udf(col0)), col1, col2, col3, col4, partial projection allows us to convert only col0 to row or column to Arrow as input, and convert udf(col0) as an alias partialProject1_. Then, ProjectExecTransformer will handle hash(partialProject1_), col1, col2, col3, col4, partialProject1_. This feature saves the cost of converting the columnar format to row format and vice-versa. + + +## Adding Partial Projection for Unsupported Expressions Review Comment: Do we need to mention `spark.gluten.expression.blacklist`? We can use this configuration to fallback to vanilla spark when some expressions have inconsistent behaviors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
