This is an automated email from the ASF dual-hosted git repository. amoghj pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push: new 07afd84080 Docs: Add BladePipe to list of vendors and blog posts (#13510) 07afd84080 is described below commit 07afd84080b323c37cb11cbc657fb719891ed91c Author: ChocZoe <1106030...@qq.com> AuthorDate: Tue Jul 15 04:58:56 2025 +0800 Docs: Add BladePipe to list of vendors and blog posts (#13510) --- docs/docs/bladepipe.md | 119 +++++++++++++++++++++++++++++++++++++++++++++++++ docs/mkdocs.yml | 1 + site/docs/blogs.md | 5 +++ site/docs/vendors.md | 4 ++ 4 files changed, 129 insertions(+) diff --git a/docs/docs/bladepipe.md b/docs/docs/bladepipe.md new file mode 100644 index 0000000000..73831ad3ca --- /dev/null +++ b/docs/docs/bladepipe.md @@ -0,0 +1,119 @@ +--- +title: "BladePipe" +--- +<!-- + - Licensed to the Apache Software Foundation (ASF) under one or more + - contributor license agreements. See the NOTICE file distributed with + - this work for additional information regarding copyright ownership. + - The ASF licenses this file to You under the Apache License, Version 2.0 + - (the "License"); you may not use this file except in compliance with + - the License. You may obtain a copy of the License at + - + - http://www.apache.org/licenses/LICENSE-2.0 + - + - Unless required by applicable law or agreed to in writing, software + - distributed under the License is distributed on an "AS IS" BASIS, + - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + - See the License for the specific language governing permissions and + - limitations under the License. + --> + +# BladePipe + +[BladePipe](https://www.bladepipe.com/) is a real-time end-to-end data integration tool, offering 40+ out-of-the-box connectors for analytics or AI. It allows to move data faster and easier than ever, with ultra-low latency less than 3 seconds. It provides a one-stop data movement solution, including schema evolution, data migration and sync, verification and correction, monitoring and alerting. + +## Supported Sources +Now BladePipe supports data integration to Iceberg from the following sources: + +- MySQL/MariaDB/AuroraMySQL +- Oracle +- PostgreSQL +- SQL Server +- Kafka + +More sources are to be supported. + +## Supported Catalogs and Storage +BladePipe currently supports 3 catalogs and 2 object storage: + +- AWS Glue + AWS S3 +- Nessie + MinIO / AWS S3 +- REST Catalog + MinIO / AWS S3 + + +## Getting Started +In this article, we will show how to load data from MySQL (self-hosted) to Iceberg (AWS Glue + S3). + +### 1. Download and Run BladePipe +Follow the instructions in [Install Worker (Docker)](https://doc.bladepipe.com/productOP/byoc/installation/install_worker_docker) or [Install Worker (Binary)](https://doc.bladepipe.com/productOP/byoc/installation/install_worker_binary) to download and install a BladePipe Worker. + +**Note**: Alternatively, you can choose to deploy and run [BladePipe Enterprise](https://doc.bladepipe.com/productOP/onPremise/installation/install_all_in_one_binary). + +### 2. Add DataSources + +1. Log in to the [BladePipe Cloud](https://cloud.bladepipe.com). +2. Click **DataSource** > **Add DataSource**. +3. Add a MySQL instance and an Iceberg instance. For Iceberg, fill in the following content (replace `<...>` with your values): + - **Address**: Fill in the AWS Glue endpoint. + + ```text + glue.<aws_glue_region_code>.amazonaws.com + ``` + + - **Version**: Leave as default. + - **Description**: Fill in meaningful words to help identify it. + - **Extra Info**: + - **httpsEnabled**: Enable it to set the value as true. + - **catalogName**: Enter a meaningful name, such as glue_<biz_name>_catalog. + - **catalogType**: Fill in `GLUE`. + - **catalogWarehouse**: The place where metadata and files are stored, such as s3://<biz_name>_iceberg. + - **catalogProps**: + + ```json + { + "io-impl": "org.apache.iceberg.aws.s3.S3FileIO", + "s3.endpoint": "https://s3.<aws_s3_region_code>.amazonaws.com", + "s3.access-key-id": "<aws_s3_iam_user_access_key>", + "s3.secret-access-key": "<aws_s3_iam_user_secret_key>", + "s3.path-style-access": "true", + "client.region": "<aws_s3_region>", + "client.credentials-provider.glue.access-key-id": "<aws_glue_iam_user_access_key>", + "client.credentials-provider.glue.secret-access-key": "<aws_glue_iam_user_secret_key>", + "client.credentials-provider": "com.amazonaws.glue.catalog.credentials.GlueAwsCredentialsProvider" + } + ``` + +  + See [Add an Iceberg DataSource](https://doc.bladepipe.com/dataMigrationAndSync/datasource_func/Iceberg/props_for_iceberg_ds) for more details. + +### 3. Create a DataJob +1. Go to **DataJob** > [**Create DataJob**](https://doc.bladepipe.com/operation/job_manage/create_job/create_full_incre_task). +2. Select the source and target DataSources, and click **Test Connection** for both. Here's the recommended Iceberg structure configuration: + ```json + { + "format-version": "2", + "parquet.compression": "snappy", + "iceberg.write.format": "parquet", + "write.metadata.delete-after-commit.enabled": "true", + "write.metadata.previous-versions-max": "3", + "write.update.mode": "merge-on-read", + "write.delete.mode": "merge-on-read", + "write.merge.mode": "merge-on-read", + "write.distribution-mode": "hash", + "write.object-storage.enabled": "true", + "write.spark.accept-any-schema": "true" + } + ``` +  + +3. Select **Incremental** for DataJob Type, together with the **Full Data** option. +  + +4. Select the tables to be replicated. +  + +5. Select the columns to be replicated. +  + +6. Confirm the DataJob creation, and start to run the DataJob. +  diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 0d8d5e78a9..e20b0e6628 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -53,6 +53,7 @@ nav: - hive.md - Trino: https://trino.io/docs/current/connector/iceberg.html - Daft: daft.md + - BladePipe: bladepipe.md - Firebolt: https://docs.firebolt.io/reference-sql/functions-reference/table-valued/read_iceberg - Estuary: https://docs.estuary.dev/reference/Connectors/materialization-connectors/apache-iceberg/ - Tinybird: https://www.tinybird.co/docs/forward/get-data-in/table-functions/iceberg diff --git a/site/docs/blogs.md b/site/docs/blogs.md index b445136122..8d619f7445 100644 --- a/site/docs/blogs.md +++ b/site/docs/blogs.md @@ -22,6 +22,11 @@ title: "Blogs" Here is a list of company blogs that talk about Iceberg. The blogs are ordered from most recent to oldest. +<!-- markdown-link-check-disable-next-line --> +### [How to Load Data from MySQL to Iceberg in Real Time](https://doc.bladepipe.com/blog/tech_share/mysql_iceberg_sync) +**Date:** July 10, 2025, **Company**: BladePipe +**Author**: [BladePipe](https://www.bladepipe.com) + <!-- markdown-link-check-disable-next-line --> ### [Making Sense of Apache Iceberg Statistics](https://www.ryft.io/blog/making-sense-of-apache-iceberg-statistics) **Date:** July 9, 2025, **Company**: Ryft diff --git a/site/docs/vendors.md b/site/docs/vendors.md index 31997f9a47..64ec6bd360 100644 --- a/site/docs/vendors.md +++ b/site/docs/vendors.md @@ -26,6 +26,10 @@ This page contains some of the vendors who are shipping and supporting Apache Ic AWS provides a [comprehensive suite of services](https://aws.amazon.com/what-is/apache-iceberg/#seo-faq-pairs#what-aws-services-support-iceberg) to support Apache Iceberg as part of its modern data architecture. [Amazon Athena](https://aws.amazon.com/athena/) offers a serverless, interactive query engine with native support for Iceberg, enabling fast and cost-efficient querying of large-scale datasets. [Amazon EMR](https://aws.amazon.com/emr/) integrates Iceberg with Apache Spark, Apache [...] +### [BladePipe](https://bladepipe.com) + +BladePipe is a real-time end-to-end data integration tool, offering 40+ out-of-the-box connectors. It provides a one-stop data movement solution, including schema evolution, data migration and sync, verification and correction, monitoring and alerting. With sub-second latency, BladePipe captures change data from MySQL, Oracle, PostgreSQL and other sources and streams it to Iceberg and more, all without writing a single line of code. It offers [on-premise and BYOC deployment options](http [...] + ### [Bodo](https://bodo.ai) Bodo is a high performance SQL & Python compute engine that brings HPC and supercomputing techniques to data analytics.