beryllw commented on code in PR #1770: URL: https://github.com/apache/fluss/pull/1770#discussion_r2387138258
########## website/docs/streaming-lakehouse/integrate-data-lakes/iceberg.md: ########## @@ -27,49 +27,185 @@ datalake.iceberg.type: hadoop datalake.iceberg.warehouse: /tmp/iceberg ``` +#### 🔧 Configuration Processing + Fluss processes Iceberg configurations by stripping the `datalake.iceberg.` prefix and uses the stripped configurations (without the prefix `datalake.iceberg.`) to initialize the Iceberg catalog. -This approach enables passing custom configurations for iceberg catalog initiation. Checkout the [Iceberg Catalog Properties](https://iceberg.apache.org/docs/1.9.1/configuration/#catalog-properties) for more details on the available configurations of catalog. -Fluss supports all Iceberg-compatible catalog types. For catalogs such as `hive`, `hadoop`, `rest`, `glue`, `nessie`, and `jdbc`, you can specify them using the configuration `datalake.iceberg.type` with the corresponding value (e.g., `hive`, `hadoop`, etc.). -For other types of catalogs, you can use `datalake.iceberg.catalog-impl: <your_iceberg_catalog_impl_class_name>` to specify the catalog implementation. -For example, configure with `datalake.iceberg.catalog-impl: org.apache.iceberg.snowflake.SnowflakeCatalog` to use Snowflake catalog. +This approach enables passing custom configurations for Iceberg catalog initialization. Check out the [Iceberg Catalog Properties](https://iceberg.apache.org/docs/1.9.1/configuration/#catalog-properties) for more details on available catalog configurations. + +#### 📋 Supported Catalog Types + +Fluss supports all Iceberg-compatible catalog types: + +**Built-in Catalog Types:** +- `hive` - Hive Metastore catalog +- `hadoop` - Hadoop catalog +- `rest` - REST catalog +- `glue` - AWS Glue catalog +- `nessie` - Nessie catalog +- `jdbc` - JDBC catalog + +**Custom Catalog Implementation:** +For other catalog types, you can use: +```yaml +datalake.iceberg.catalog-impl: <your_iceberg_catalog_impl_class_name> +``` + +**Example - Snowflake Catalog:** +```yaml +datalake.iceberg.catalog-impl: org.apache.iceberg.snowflake.SnowflakeCatalog +``` + +#### 🔧 Prerequisites + +##### 1. Hadoop Dependencies Configuration + +Some catalogs (such as `hadoop`, `hive` catalog) require Hadoop-related classes. Please ensure Hadoop-related classes are in your classpath. + +**Option 1: Use Existing Hadoop Environment (Recommended)** +```bash +export HADOOP_CLASSPATH=`hadoop classpath` +``` +Export Hadoop classpath before starting Fluss. This allows Fluss to automatically load Hadoop dependencies from the machine. + +**Option 2: Download Pre-bundled Hadoop JAR** +- Download: [hadoop-apache-3.3.5-2.jar](https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar) +- Place the JAR file into `FLUSS_HOME/plugins/iceberg/` directory + +**Option 3: Download Complete Hadoop Package** +- Download: [hadoop-3.3.5.tar.gz](https://archive.apache.org/dist/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz) +- Extract and configure HADOOP_CLASSPATH: +```bash +# Download and extract Hadoop +wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz +tar -xzf hadoop-3.3.5.tar.gz + +# Set HADOOP_HOME to the extracted directory +export HADOOP_HOME=$(pwd)/hadoop-3.3.5 + +# Set HADOOP_CLASSPATH using the downloaded Hadoop +export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` +``` + +##### 2. Custom Catalog Implementations + +Fluss only bundles catalog implementations included in the `iceberg-core` module. For any other catalog implementations not bundled within the `iceberg-core` module (e.g., Hive Catalog), you must place the corresponding JAR file into `FLUSS_HOME/plugins/iceberg/`. + +##### 3. Version Compatibility + +The Iceberg version that Fluss bundles is based on `1.9.2`. Please ensure the JARs you add are compatible with `Iceberg-1.9.2`. Review Comment: May be 1.9.1? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
