This is an automated email from the ASF dual-hosted git repository.
bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 56e15fb [HUDI-2806] - Docs for Transformer Utilities (#4097)
56e15fb is described below
commit 56e15fba3f6b9719bf9cd4e64694468c73c275f4
Author: Kyle Weller <[email protected]>
AuthorDate: Fri Nov 26 14:29:00 2021 -0700
[HUDI-2806] - Docs for Transformer Utilities (#4097)
* outline for Transformers doc
* adding more examples
* added conf for DMS transformer and the example for Chained Transformer
---
website/docs/transforms.md | 66 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/website/docs/transforms.md b/website/docs/transforms.md
new file mode 100644
index 0000000..4b594f7
--- /dev/null
+++ b/website/docs/transforms.md
@@ -0,0 +1,66 @@
+---
+title: Transformers
+toc: true
+---
+
+Apache Hudi provides a HoodieTransformer Utility that allows you to perform
transformations the source data before writing it to a Hudi table.
+There are several
[out-of-the-box](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
+transformers available and you can build your own custom transformer class as
well.
+
+### SQL Query Transformer
+You can pass a SQL Query to be executed during write.
+
+```scala
+--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
+--hoodie-conf hoodie.deltastreamer.transformer.sql=SELECT a.col1, a.col3,
a.col4 FROM <SRC> a
+```
+
+### SQL File Transformer
+You can specify a File with a SQL script to be executed during write. The SQL
file is configured with this hoodie property:
+hoodie.deltastreamer.transformer.sql.file
+
+The query should reference the source as a table named "\<SRC\>"
+
+The final sql statement result is used as the write payload.
+
+Example Spark SQL Query:
+```sql
+CACHE TABLE tmp_personal_trips AS
+SELECT * FROM <SRC> WHERE trip_type='personal_trips';
+
+SELECT * FROM tmp_personal_trips;
+```
+
+### Flattening Transformer
+This transformer can flatten nested objects. It flattens the nested fields in
the incoming records by prefixing
+inner-fields with outer-field and _ in a nested fashion. Currently flattening
of arrays is not supported.
+
+An example schema may look something like the below where name is a nested
field of StructType in the original source
+```scala
+age as intColumn,address as stringColumn,name.first as name_first,name.last as
name_last, name.middle as name_middle
+```
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.FlatteningTransformer
+```
+
+### Chained Transformer
+If you wish to use multiple transformers together, you can use the Chained
transformers to pass multiple to be executed sequentially.
+
+Example below first flattens the incoming records and then does sql projection
based on the query specified:
+```scala
+--transformer-class
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
+--hoodie-conf hoodie.deltastreamer.transformer.sql=SELECT a.col1, a.col3,
a.col4 FROM <SRC> a
+```
+
+### AWS DMS Transformer
+This transformer is specific for AWS DMS data. It adds `Op` field with value
`I` if the field is not present.
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
+```
+
+### Custom Transformer Implementation
+You can write your own custom transformer by extending [this
class](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)