This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7ce6ec98e85 [MINOR][DOCS] Re-org pages: sql procedures and
transformers (#10892)
7ce6ec98e85 is described below
commit 7ce6ec98e851d5304c2049119f1a975ec6108d5c
Author: Shiyan Xu <[email protected]>
AuthorDate: Wed Mar 20 11:09:18 2024 -0500
[MINOR][DOCS] Re-org pages: sql procedures and transformers (#10892)
---
...ild-open-lakehouse-using-apache-hudi-and-dbt.md | 2 +-
website/docs/hoodie_streaming_ingestion.md | 66 +++++++++++++++++++-
website/docs/procedures.md | 12 ++--
website/docs/transforms.md | 72 ----------------------
website/sidebars.js | 3 +-
5 files changed, 72 insertions(+), 83 deletions(-)
diff --git
a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
index 7b8306d70aa..39895619004 100644
--- a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
+++ b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
@@ -64,7 +64,7 @@ You can follow the instructions on this
[page](https://github.com/apache/hudi/bl
This is the first step in building your data lake and there are many choices
here to load the data into our open lakehouse. I’m going to go with one of the
Hudi’s native tools called Delta Streamer since all the ingestion features are
pre-built and battle-tested in production at scale.
-Hudi’s
[DeltaStreamer](https://hudi.apache.org/docs/hoodie_streaming_ingestion) does
the EL in ELT (Extract, Load, Transform) processes – it’s extremely good at
extracting, loading, and optionally [transforming
data](https://hudi.apache.org/docs/transforms) that’s already loaded into your
lakehouse.
+Hudi’s
[DeltaStreamer](https://hudi.apache.org/docs/hoodie_streaming_ingestion) does
the EL in ELT (Extract, Load, Transform) processes – it’s extremely good at
extracting, loading, and optionally [transforming
data](https://hudi.apache.org/docs/hoodie_streaming_ingestion#transformers)
that’s already loaded into your lakehouse.
## Step 2: How to configure hudi with the dbt project?
diff --git a/website/docs/hoodie_streaming_ingestion.md
b/website/docs/hoodie_streaming_ingestion.md
index 24228a5c064..6d3dc3a34e5 100644
--- a/website/docs/hoodie_streaming_ingestion.md
+++ b/website/docs/hoodie_streaming_ingestion.md
@@ -320,8 +320,70 @@ For Kafka, this is the max # of events to read.
### Transformers
`HoodieStreamer` supports custom transformation on records before writing to
storage. This is done by supplying
-implementation of `org.apache.hudi.utilities.transform.Transformer` via
`--transformer-class` option. Check out
-the [Transformers page](/docs/transforms) for details.
+implementation of `org.apache.hudi.utilities.transform.Transformer` via
`--transformer-class` option.
+
+#### SQL Query Transformer
+You can pass a SQL Query to be executed during write.
+
+```scala
+--transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4
FROM <SRC> a
+```
+
+#### SQL File Transformer
+You can specify a File with a SQL script to be executed during write. The SQL
file is configured with this hoodie property:
+hoodie.streamer.transformer.sql.file
+
+The query should reference the source as a table named "\<SRC\>"
+
+The final sql statement result is used as the write payload.
+
+Example Spark SQL Query:
+```sql
+CACHE TABLE tmp_personal_trips AS
+SELECT * FROM <SRC> WHERE trip_type='personal_trips';
+
+SELECT * FROM tmp_personal_trips;
+```
+
+#### Flattening Transformer
+This transformer can flatten nested objects. It flattens the nested fields in
the incoming records by prefixing
+inner-fields with outer-field and _ in a nested fashion. Currently flattening
of arrays is not supported.
+
+An example schema may look something like the below where name is a nested
field of StructType in the original source
+```scala
+age as intColumn,address as stringColumn,name.first as name_first,name.last as
name_last, name.middle as name_middle
+```
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.FlatteningTransformer
+```
+
+#### Chained Transformer
+If you wish to use multiple transformers together, you can use the Chained
transformers to pass multiple to be executed sequentially.
+
+Example below first flattens the incoming records and then does sql projection
based on the query specified:
+```scala
+--transformer-class
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
+--hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4
FROM <SRC> a
+```
+
+#### AWS DMS Transformer
+This transformer is specific for AWS DMS data. It adds `Op` field with value
`I` if the field is not present.
+
+Set the config as:
+```scala
+--transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
+```
+
+#### Custom Transformer Implementation
+You can write your own custom transformer by extending [this
class](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
+
+#### Related Resources
+
+* [Learn about Apache Hudi Transformers with Hands on
Lab](https://www.youtube.com/watch?v=AprlZ8hGdJo)
+* [Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and
Glue Interactive Session](https://youtu.be/DH3LEaPG6ss)
### Schema Providers
diff --git a/website/docs/procedures.md b/website/docs/procedures.md
index 49e8de775a5..fda7c08748a 100644
--- a/website/docs/procedures.md
+++ b/website/docs/procedures.md
@@ -1,16 +1,16 @@
---
title: SQL Procedures
-summary: "In this page, we introduce how to use procedures with Hudi."
+summary: "In this page, we introduce how to use SQL procedures with Hudi."
toc: true
last_modified_at:
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-Stored procedures available when use Hudi SparkSQL extensions in all spark's
version.
+Stored procedures are available when use Hudi SparkSQL extensions in all
spark's version.
## Usage
-CALL supports passing arguments by name (recommended) or by position. Mixing
position and named arguments is also supported.
+`CALL` supports passing arguments by name (recommended) or by position. Mixing
position and named arguments is also supported.
#### Named arguments
All procedure arguments are named. When passing arguments by name, arguments
can be in any order and any optional argument can be omitted.
@@ -24,7 +24,7 @@ CALL system.procedure_name(arg_1, arg_2, ... arg_n);
```
*note:* The system here has no practical meaning, the complete procedure name
is system.procedure_name.
-### Help Procedure
+### help
Show parameters and outputTypes of a procedure.
@@ -40,7 +40,7 @@ Show parameters and outputTypes of a procedure.
|--------------|--------|
| result | String |
-**Example**x
+**Example**
```
call help(cmd => 'show_commits');
@@ -1235,7 +1235,7 @@ call show_fsview_latest(table => 'test_hudi_table',
partition => 'dt=2021-05-0
|---------------|----------------------------------------|-------------------|--------------------------------------------------------------------------|----------------|-----------------|-----------------------|-------------------------------------------------------------------------|
| dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | 20220109225319449 |
7fb52523-c7f6-41aa-84a6-629041477aeb-0_0-92-99_20220109225319449.parquet |
5319449 | 1 | 213193 |
.7fb52523-c7f6-41aa-84a6-629041477aeb-0_20230205133217210.log.1_0-60-63 |
-## Optimization table
+## Table services
### run_clustering
diff --git a/website/docs/transforms.md b/website/docs/transforms.md
deleted file mode 100644
index ff4de1a64b7..00000000000
--- a/website/docs/transforms.md
+++ /dev/null
@@ -1,72 +0,0 @@
----
-title: Transformers
-toc: true
----
-
-Apache Hudi provides a HoodieTransformer Utility that allows you to perform
transformations the source data before writing it to a Hudi table.
-There are several
[out-of-the-box](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
-transformers available and you can build your own custom transformer class as
well.
-
-### SQL Query Transformer
-You can pass a SQL Query to be executed during write.
-
-```scala
---transformer-class
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
---hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4
FROM <SRC> a
-```
-
-### SQL File Transformer
-You can specify a File with a SQL script to be executed during write. The SQL
file is configured with this hoodie property:
-hoodie.streamer.transformer.sql.file
-
-The query should reference the source as a table named "\<SRC\>"
-
-The final sql statement result is used as the write payload.
-
-Example Spark SQL Query:
-```sql
-CACHE TABLE tmp_personal_trips AS
-SELECT * FROM <SRC> WHERE trip_type='personal_trips';
-
-SELECT * FROM tmp_personal_trips;
-```
-
-### Flattening Transformer
-This transformer can flatten nested objects. It flattens the nested fields in
the incoming records by prefixing
-inner-fields with outer-field and _ in a nested fashion. Currently flattening
of arrays is not supported.
-
-An example schema may look something like the below where name is a nested
field of StructType in the original source
-```scala
-age as intColumn,address as stringColumn,name.first as name_first,name.last as
name_last, name.middle as name_middle
-```
-
-Set the config as:
-```scala
---transformer-class org.apache.hudi.utilities.transform.FlatteningTransformer
-```
-
-### Chained Transformer
-If you wish to use multiple transformers together, you can use the Chained
transformers to pass multiple to be executed sequentially.
-
-Example below first flattens the incoming records and then does sql projection
based on the query specified:
-```scala
---transformer-class
org.apache.hudi.utilities.transform.FlatteningTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
---hoodie-conf hoodie.streamer.transformer.sql=SELECT a.col1, a.col3, a.col4
FROM <SRC> a
-```
-
-### AWS DMS Transformer
-This transformer is specific for AWS DMS data. It adds `Op` field with value
`I` if the field is not present.
-
-Set the config as:
-```scala
---transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
-```
-
-### Custom Transformer Implementation
-You can write your own custom transformer by extending [this
class](https://github.com/apache/hudi/tree/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform)
-
-## Related Resources
-<h3>Videos</h3>
-
-* [Learn about Apache Hudi Transformers with Hands on
Lab](https://www.youtube.com/watch?v=AprlZ8hGdJo)
-* [Apache Hudi with DBT Hands on Lab.Transform Raw Hudi tables with DBT and
Glue Interactive Session](https://youtu.be/DH3LEaPG6ss)
\ No newline at end of file
diff --git a/website/sidebars.js b/website/sidebars.js
index 6ef2b9c77a9..379f1a856a5 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -68,13 +68,11 @@ module.exports = {
type: 'category',
label: 'Table Services',
items: [
- 'procedures',
'migration_guide',
'compaction',
'clustering',
'metadata_indexing',
'hoodie_cleaner',
- 'transforms',
'rollbacks',
'markers',
'file_sizing',
@@ -107,6 +105,7 @@ module.exports = {
items: [
'performance',
'deployment',
+ 'procedures',
'cli',
'metrics',
'encryption',