Re: [PR] [hotfix] Fix commit bugs of iceberg and improve document [fluss]

via GitHub Thu, 02 Oct 2025 16:56:38 -0700


beryllw commented on code in PR #1770:
URL: https://github.com/apache/fluss/pull/1770#discussion_r2387157663



##########
website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md:
##########
@@ -27,49 +27,185 @@ datalake.iceberg.type: hadoop
 datalake.iceberg.warehouse: /tmp/iceberg
 ```
 
+#### 🔧 Configuration Processing
+
 Fluss processes Iceberg configurations by stripping the `datalake.iceberg.` 
prefix and uses the stripped configurations (without the prefix 
`datalake.iceberg.`) to initialize the Iceberg catalog.
-This approach enables passing custom configurations for iceberg catalog 
initiation. Checkout the [Iceberg Catalog 
Properties](https://iceberg.apache.org/docs/1.9.1/configuration/#catalog-properties)
 for more details on the available configurations of catalog.
 
-Fluss supports all Iceberg-compatible catalog types. For catalogs such as 
`hive`, `hadoop`, `rest`, `glue`, `nessie`, and `jdbc`, you can specify them 
using the configuration `datalake.iceberg.type` with the corresponding value 
(e.g., `hive`, `hadoop`, etc.).
-For other types of catalogs, you can use `datalake.iceberg.catalog-impl: 
<your_iceberg_catalog_impl_class_name>` to specify the catalog implementation.
-For example, configure with `datalake.iceberg.catalog-impl: 
org.apache.iceberg.snowflake.SnowflakeCatalog` to use Snowflake catalog.
+This approach enables passing custom configurations for Iceberg catalog 
initialization. Check out the [Iceberg Catalog 
Properties](https://iceberg.apache.org/docs/1.9.1/configuration/#catalog-properties)
 for more details on available catalog configurations.
+
+#### 📋 Supported Catalog Types
+
+Fluss supports all Iceberg-compatible catalog types:
+
+**Built-in Catalog Types:**
+- `hive` - Hive Metastore catalog
+- `hadoop` - Hadoop catalog
+- `rest` - REST catalog
+- `glue` - AWS Glue catalog
+- `nessie` - Nessie catalog
+- `jdbc` - JDBC catalog
+
+**Custom Catalog Implementation:**
+For other catalog types, you can use:
+```yaml
+datalake.iceberg.catalog-impl: <your_iceberg_catalog_impl_class_name>
+```
+
+**Example - Snowflake Catalog:**
+```yaml
+datalake.iceberg.catalog-impl: org.apache.iceberg.snowflake.SnowflakeCatalog
+```
+
+#### 🔧 Prerequisites
+
+##### 1. Hadoop Dependencies Configuration
+
+Some catalogs (such as `hadoop`, `hive` catalog) require Hadoop-related 
classes. Please ensure Hadoop-related classes are in your classpath.
+
+**Option 1: Use Existing Hadoop Environment (Recommended)**
+```bash
+export HADOOP_CLASSPATH=`hadoop classpath`
+```
+Export Hadoop classpath before starting Fluss. This allows Fluss to 
automatically load Hadoop dependencies from the machine.
+
+**Option 2: Download Pre-bundled Hadoop JAR**
+- Download: 
[hadoop-apache-3.3.5-2.jar](https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar)
+- Place the JAR file into `FLUSS_HOME/plugins/iceberg/` directory
+
+**Option 3: Download Complete Hadoop Package**
+- Download: 
[hadoop-3.3.5.tar.gz](https://archive.apache.org/dist/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz)
+- Extract and configure HADOOP_CLASSPATH:
+```bash
+# Download and extract Hadoop
+wget 
https://archive.apache.org/dist/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
+tar -xzf hadoop-3.3.5.tar.gz
+
+# Set HADOOP_HOME to the extracted directory
+export HADOOP_HOME=$(pwd)/hadoop-3.3.5
+
+# Set HADOOP_CLASSPATH using the downloaded Hadoop
+export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
+```
+
+##### 2. Custom Catalog Implementations
+
+Fluss only bundles catalog implementations included in the `iceberg-core` 
module. For any other catalog implementations not bundled within the 
`iceberg-core` module (e.g., Hive Catalog), you must place the corresponding 
JAR file into `FLUSS_HOME/plugins/iceberg/`.
+
+##### 3. Version Compatibility
+
+The Iceberg version that Fluss bundles is based on `1.9.2`. Please ensure the 
JARs you add are compatible with `Iceberg-1.9.2`.
+
+#### ⚠️ Important Notes
+
+- Ensure all JAR files are compatible with Iceberg 1.9.2
+- If using an existing Hadoop environment, it's recommended to use the 
`HADOOP_CLASSPATH` environment variable
+- Configuration changes take effect after restarting the Fluss service
+
+### 🚀 Start Tiering Service to Iceberg
+
+To tier Fluss's data to Iceberg, you must start the datalake tiering service. 
For guidance, you can refer to [Start The Datalake Tiering 
Service](maintenance/tiered-storage/lakehouse-storage.md#start-the-datalake-tiering-service).
 Although the example uses Paimon, the process is also applicable to Iceberg.
+
+#### 🔧 Prerequisites: Hadoop Dependencies
+
+**⚠️ Important**: Iceberg has a strong dependency on Hadoop. You must ensure 
Hadoop-related classes are available in the classpath before starting the 
tiering service.
+
+##### Option 1: Use Existing Hadoop Environment (Recommended)

Review Comment:
   Looks similar to the Iceberg catalog Hadoop dependencies configuration. 
Maybe we could reference this document?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [hotfix] Fix commit bugs of iceberg and improve document [fluss]

Reply via email to