(hudi) branch asf-site updated: [MINOR][DOCS] Re-org pages: sql procedures and transformers (#10892)

xushiyan Wed, 20 Mar 2024 09:10:06 -0700

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7ce6ec98e85 [MINOR][DOCS] Re-org pages: sql procedures and 
transformers (#10892)
7ce6ec98e85 is described below

commit 7ce6ec98e851d5304c2049119f1a975ec6108d5c
Author: Shiyan Xu <[email protected]>
AuthorDate: Wed Mar 20 11:09:18 2024 -0500

    [MINOR][DOCS] Re-org pages: sql procedures and transformers (#10892)
---
 ...ild-open-lakehouse-using-apache-hudi-and-dbt.md |  2 +-
 website/docs/hoodie_streaming_ingestion.md         | 66 +++++++++++++++++++-
 website/docs/procedures.md                         | 12 ++--
 website/docs/transforms.md                         | 72 ----------------------
 website/sidebars.js                                |  3 +-
 5 files changed, 72 insertions(+), 83 deletions(-)

diff --git 
a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md 
b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
index 7b8306d70aa..39895619004 100644
--- a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
+++ b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
@@ -64,7 +64,7 @@ You can follow the instructions on this 
[page](https://github.com/apache/hudi/bl
 
 This is the first step in building your data lake and there are many choices 
here to load the data into our open lakehouse. I’m going to go with one of the 
Hudi’s native tools called Delta Streamer since all the ingestion features are 
pre-built and battle-tested in production at scale.
 
-Hudi’s 
[DeltaStreamer](https://hudi.apache.org/docs/hoodie_streaming_ingestion) does 
the EL in ELT (Extract, Load, Transform) processes – it’s extremely good at 
extracting, loading, and optionally [transforming 
data](https://hudi.apache.org/docs/transforms) that’s already loaded into your 
lakehouse.
+Hudi’s 
[DeltaStreamer](https://hudi.apache.org/docs/hoodie_streaming_ingestion) does 
the EL in ELT (Extract, Load, Transform) processes – it’s extremely good at 
extracting, loading, and optionally [transforming 
data](https://hudi.apache.org/docs/hoodie_streaming_ingestion#transformers) 
that’s already loaded into your lakehouse.
 
 ## Step 2: How to configure hudi with the dbt project?
 
diff --git a/website/docs/hoodie_streaming_ingestion.md 
b/website/docs/hoodie_streaming_ingestion.md
index 24228a5c064..6d3dc3a34e5 100644
--- a/website/docs/hoodie_streaming_ingestion.md
+++ b/website/docs/hoodie_streaming_ingestion.md
@@ -320,8 +320,70 @@ For Kafka, this is the max # of events to read.
 ### Transformers
 
 `HoodieStreamer` supports custom transformation on records before writing to 
storage. This is done by supplying 
-implementation of `org.apache.hudi.utilities.transform.Transformer` via 
`--transformer-class` option. Check out
-the [Transformers page](/docs/transforms) for details.
+implementation of `org.apache.hudi.utilities.transform.Transformer` via 
`--transformer-class` option.
+
+#### SQL Query Transformer
+You can pass a SQL Query to be executed during write.
+
+```scala
+--transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
+```
+
+#### SQL File Transformer
+You can specify a File with a SQL script to be executed during write. The SQL 
file is configured with this hoodie property:
+hoodie.streamer.transformer.sql.file
+
+The query should reference the source as a table named "\<SRC\>"
+
+The final sql statement result is used as the write payload.
+
+Example Spark SQL Query:
+```sql
+CACHE TABLE tmp_personal_trips AS
+SELECT * FROM <SRC> WHERE trip_type='personal_trips';
+
+SELECT * FROM tmp_personal_trips;
+```
+
+#### Flattening Transformer
+This transformer can flatten nested objects. It flattens the nested fields in 
the incoming records by prefixing
+inner-fields with outer-field and _ in a nested fashion. Currently flattening 
of arrays is not supported.
+
+An example schema may look something like the below where name is a nested 
field of StructType in the original source
+```scala
+age as intColumn,address as stringColumn,name.first as name_first,name.last as 
name_last, name.middle as name_middle
+```
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.FlatteningTransformer
+```
+
+#### Chained Transformer
+If you wish to use multiple transformers together, you can use the Chained 
transformers to pass multiple to be executed sequentially.
+
+Example below first flattens the incoming records and then does sql projection 
based on the query specified:
+```scala
+--transformer-class 
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
+```
+
+#### AWS DMS Transformer
+This transformer is specific for AWS DMS data. It adds `Op` field with value 
`I` if the field is not present.
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
+```
+
+#### Custom Transformer Implementation
+You can write your own custom transformer by extending [this 
class](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
+
+#### Related Resources
+
+* [Learn about Apache Hudi Transformers with Hands on 
Lab](https://www.youtube.com/watch?v=AprlZ8hGdJo)
+* [Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and 
Glue Interactive Session](https://youtu.be/DH3LEaPG6ss)
 
 ### Schema Providers
 
diff --git a/website/docs/procedures.md b/website/docs/procedures.md
index 49e8de775a5..fda7c08748a 100644
--- a/website/docs/procedures.md
+++ b/website/docs/procedures.md
@@ -1,16 +1,16 @@
 ---
 title: SQL Procedures
-summary: "In this page, we introduce how to use procedures with Hudi."
+summary: "In this page, we introduce how to use SQL procedures with Hudi."
 toc: true
 last_modified_at: 
 ---
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
-Stored procedures available when use Hudi SparkSQL extensions in all spark's 
version.
+Stored procedures are available when use Hudi SparkSQL extensions in all 
spark's version.
 
 ## Usage
-CALL supports passing arguments by name (recommended) or by position. Mixing 
position and named arguments is also supported.
+`CALL` supports passing arguments by name (recommended) or by position. Mixing 
position and named arguments is also supported.
 
 #### Named arguments
 All procedure arguments are named. When passing arguments by name, arguments 
can be in any order and any optional argument can be omitted.
@@ -24,7 +24,7 @@ CALL system.procedure_name(arg_1, arg_2, ... arg_n);
 ```
 *note:* The system here has no practical meaning, the complete procedure name 
is system.procedure_name.
 
-### Help Procedure
+### help
 
 Show parameters and outputTypes of a procedure.
 
@@ -40,7 +40,7 @@ Show parameters and outputTypes of a procedure.
 |--------------|--------|
 | result       | String |
 
-**Example**x
+**Example**
 
 ```
 call help(cmd => 'show_commits');
@@ -1235,7 +1235,7 @@ call show_fsview_latest(table => 'test_hudi_table'， 
partition => 'dt=2021-05-0
 
|---------------|----------------------------------------|-------------------|--------------------------------------------------------------------------|----------------|-----------------|-----------------------|-------------------------------------------------------------------------|
 | dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | 20220109225319449 | 
7fb52523-c7f6-41aa-84a6-629041477aeb-0_0-92-99_20220109225319449.parquet | 
5319449        | 1               | 213193                | 
.7fb52523-c7f6-41aa-84a6-629041477aeb-0_20230205133217210.log.1_0-60-63 |
 
-## Optimization table
+## Table services
 
 ### run_clustering
 
diff --git a/website/docs/transforms.md b/website/docs/transforms.md
deleted file mode 100644
index ff4de1a64b7..00000000000
--- a/website/docs/transforms.md
+++ /dev/null
@@ -1,72 +0,0 @@
----
-title: Transformers
-toc: true
----
-
-Apache Hudi provides a HoodieTransformer Utility that allows you to perform 
transformations the source data before writing it to a Hudi table.
-There are several 
[out-of-the-box](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
-transformers available and you can build your own custom transformer class as 
well.
-
-### SQL Query Transformer
-You can pass a SQL Query to be executed during write.
-
-```scala
---transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
---hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
-```
-
-### SQL File Transformer
-You can specify a File with a SQL script to be executed during write. The SQL 
file is configured with this hoodie property:
-hoodie.streamer.transformer.sql.file
-
-The query should reference the source as a table named "\<SRC\>"
- 
-The final sql statement result is used as the write payload.
- 
-Example Spark SQL Query:
-```sql
-CACHE TABLE tmp_personal_trips AS
-SELECT * FROM <SRC> WHERE trip_type='personal_trips';
-
-SELECT * FROM tmp_personal_trips;
-```
-
-### Flattening Transformer
-This transformer can flatten nested objects. It flattens the nested fields in 
the incoming records by prefixing 
-inner-fields with outer-field and _ in a nested fashion. Currently flattening 
of arrays is not supported.
-
-An example schema may look something like the below where name is a nested 
field of StructType in the original source
-```scala
-age as intColumn,address as stringColumn,name.first as name_first,name.last as 
name_last, name.middle as name_middle
-```
-
-Set the config as:
-```scala
---transformer-class org.apache.hudi.utilities.transform.FlatteningTransformer
-```
-
-### Chained Transformer
-If you wish to use multiple transformers together, you can use the Chained 
transformers to pass multiple to be executed sequentially.
-
-Example below first flattens the incoming records and then does sql projection 
based on the query specified:
-```scala
---transformer-class 
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
   
---hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4 
FROM <SRC> a
-```
-
-### AWS DMS Transformer
-This transformer is specific for AWS DMS data. It adds `Op` field with value 
`I` if the field is not present.
-
-Set the config as:
-```scala
---transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
-```
-
-### Custom Transformer Implementation
-You can write your own custom transformer by extending [this 
class](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
-
-## Related Resources
-<h3>Videos</h3>
-
-* [Learn about Apache Hudi Transformers with Hands on 
Lab](https://www.youtube.com/watch?v=AprlZ8hGdJo)
-* [Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and 
Glue Interactive Session](https://youtu.be/DH3LEaPG6ss)
\ No newline at end of file
diff --git a/website/sidebars.js b/website/sidebars.js
index 6ef2b9c77a9..379f1a856a5 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -68,13 +68,11 @@ module.exports = {
             type: 'category',
             label: 'Table Services',
             items: [
-                'procedures',
                 'migration_guide',
                 'compaction',
                 'clustering',
                 'metadata_indexing',
                 'hoodie_cleaner',
-                'transforms',
                 'rollbacks',
                 'markers',
                 'file_sizing',
@@ -107,6 +105,7 @@ module.exports = {
             items: [
                 'performance',
                 'deployment',
+                'procedures',
                 'cli',
                 'metrics',
                 'encryption',

(hudi) branch asf-site updated: [MINOR][DOCS] Re-org pages: sql procedures and transformers (#10892)

Reply via email to