This is an automated email from the ASF dual-hosted git repository.
yuanzhou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new c0d5d39fb3 [GLUTEN-9004][DOC] Document Partial Projection feature
(#10135)
c0d5d39fb3 is described below
commit c0d5d39fb36c30c219917e25dae71675691b6603
Author: Yuan <[email protected]>
AuthorDate: Sat Aug 23 19:05:33 2025 +0100
[GLUTEN-9004][DOC] Document Partial Projection feature (#10135)
This patch adds the doc for partial projection
---------
Signed-off-by: Yuan <[email protected]>
---
docs/developers/PartialProject.md | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/docs/developers/PartialProject.md
b/docs/developers/PartialProject.md
new file mode 100644
index 0000000000..3743815905
--- /dev/null
+++ b/docs/developers/PartialProject.md
@@ -0,0 +1,30 @@
+---
+layout: page
+title: PartialProject
+nav_order: 15
+parent: Developer Overview
+---
+
+# Partial Projection Support
+
+In Gluten, there is still a gap in supporting all Spark expressions natively
(e.g., some JSON functions or Java UDFs). In this case, Gluten will choose the
JVM code path to run the expressions, which can introduce performance
regressions.
+
+Partial projections, which allow Gluten to minimal data copy between JVM and
C++, were added to avoid these performance regressions.
+
+
+## Detailed Implementations
+
+### Adding Partial Projection for UDF
+
+For example, with the expression `hash(udf(col0)), col1, col2, col3, col4`,
partial projection allows us to convert only `col0` to row or column to Arrow
as input, and convert `udf(col0)` as an alias `partialProject1_`. Then,
ProjectExecTransformer will handle `hash(partialProject1_), col1, col2, col3,
col4, partialProject1_`. This feature saves the cost of converting the columnar
format to row format and vice-versa.
+
+
+## Adding Partial Projection for Unsupported Expressions
+
+The partial projection feature can also benefit from expressions that are not
natively supported. For example, `substr(from_json(col_a))`. Since `from_json`
is not fully supported, Gluten may use the JVM code path. Instead of projecting
the whole expression, partial projection will attempt to project `from_json()`
and perform a native projection of `substr()`.
+
+In the case of blacklisted expressions defined in
`spark.gluten.expression.blacklist`, this feature is also beneficial.
+
+## Limitations
+
+This feature is in the preliminary stages of development and will be improved
in future updates.
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]