This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 53686c4827 New blog 12 19 (#7516) 53686c4827 is described below commit 53686c4827a804459641bb9d5d0606693c80a9ee Author: nfarah86 <nfara...@gmail.com> AuthorDate: Wed Dec 21 00:13:31 2022 -0800 New blog 12 19 (#7516) * added new blog with assets * updated blog * updated author Co-authored-by: nadine <nfarah@nadines-MacBook-Pro.local> --- ...irst-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md | 49 +++++++++++++++++++++ website/static/assets/images/blog/DataCouncil.jpg | Bin 0 -> 367073 bytes ...ild-your-first-hudi-lakehouse-12-19-diagram.jpg | Bin 0 -> 565676 bytes 3 files changed, 49 insertions(+) diff --git a/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md new file mode 100644 index 0000000000..03f15df5e5 --- /dev/null +++ b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md @@ -0,0 +1,49 @@ +--- +title: "Build Your First Hudi Lakehouse with AWS S3 and AWS Glue" +excerpt: "Follow this tutorial on building your first hudi lakehouse with AWS S3 & AWS Glue" +author: Nadine Farah +category: blog +image: /assets/images/blog/DataCouncil.jpg +tags: +- how-to +- use-case +- apache hudi +- aws +--- + +![/assets/images/blog/DataCouncil.jpg](/assets/images/blog/DataCouncil.jpg) + + +# Build Your First Hudi Lakehouse with AWS S3 and AWS Glue + +Soumil Shah is a Hudi community champion building [YouTube content](https://www.youtube.com/@SoumilShah/playlists) so developers can easily get started incorporating a lakehouse into their data infrastructure. In this [video](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6), Soumil shows you how to get started with AWS Glue, AWS S3, Hudi and Athena. + +In this tutorial, you’ll learn how to: +- Create and configure AWS Glue +- Create a Hudi Table +- Create a Spark Data Frame +- Add data to the Hudi Table +- Query data via Athena + +![/assets/images/blog/build-your-first-hudi-lakehouse-12-19-diagram.jpg](/assets/images/blog/build-your-first-hudi-lakehouse-12-19-diagram.jpg) + + +**Step 1**: Users in this architecture purchase things from online retailers and generate an order transaction that is kept in DynamoDB. + +**Step 2**: The raw data layer stores the order transaction data that is fed into the data lake. To accomplish this, enable Kinesis Data Streams for DynamoDB, and we will stream real-time transactions from DynamoDB into kinesis data streams, process the streaming data with lambda, and insert the data into the next kinesis stream, where a glue streaming job will process and insert the data into Apache Hudi Transaction data lake. + +**Step 3**: Users can build dashboards and derive insights using QuickSight. + +## Getting Started + +To get started on building this data app, follow the YouTube video on +[Build Datalakes on S3 and Glue with Apache HUDI](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6&). + +Follow the the [step-by-step instructions](https://drive.google.com/file/d/1W-E_SupsoI8VZWGtq5d7doxdWdNDPEoj/view). + + +Apply the [code source](https://github.com/soumilshah1995/dynamodb-hudi-stream-project). + +## Questions + +If you run into blockers doing this tutorial, please reach out on the Apache Hudi community and tag **soumilshah1995** to help debug. \ No newline at end of file diff --git a/website/static/assets/images/blog/DataCouncil.jpg b/website/static/assets/images/blog/DataCouncil.jpg new file mode 100644 index 0000000000..145aae40e7 Binary files /dev/null and b/website/static/assets/images/blog/DataCouncil.jpg differ diff --git a/website/static/assets/images/blog/build-your-first-hudi-lakehouse-12-19-diagram.jpg b/website/static/assets/images/blog/build-your-first-hudi-lakehouse-12-19-diagram.jpg new file mode 100644 index 0000000000..36415bffa9 Binary files /dev/null and b/website/static/assets/images/blog/build-your-first-hudi-lakehouse-12-19-diagram.jpg differ