openinx commented on pull request #1558: URL: https://github.com/apache/iceberg/pull/1558#issuecomment-707097927
> Can you help me understand Flink's behavior a bit more? @rdblue , you may want to read this flink [document](https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/#application-mode). There's a `application mode` in flink, means users could just upload the `iceberg-flink-runtime.jar` to HDFS, and then start the command like the following: ``` bin/flink run-application -p 5 \ -d \ -t yarn-application \ -yD yarn.provided.lib.dirs="hdfs://hadoopcluster/data/flink/libs/" \ hdfs://hadoopcluster/data/flink/user-lib/flink-kafka-iceberg.jar ``` it will bootstrap a separate flink cluster on yarn for the current submission, and the started flink jobmanager will execute the `main()` entry from the user-provided jar. In that case, the classpath or environment inside yarn container should not be modifiable by the user-provided jar. Providing a flink config key to indicate the hive-site URI sounds good to me. then we could submit the flink job in application mode like following: ``` bin/flink run-application -p 5 \ -d \ -t yarn-application \ -D iceberg.hive-site.path=hdfs://config/hive-site.xml \ -yD yarn.provided.lib.dirs="hdfs://hadoopcluster/data/flink/libs/" \ hdfs://hadoopcluster/data/flink/user-lib/flink-kafka-iceberg.jar ``` Above we are talking about a Flink Datastream job. For Flink SQL job, we could do the similar thing like `HiveCatalog` in flink: ```sql CREATE CATALOG hive_catalog WITH ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://localhost:9083', 'clients'='5', 'property-version'='1', 'hive-conf-dir'='hdfs://config/hive-site.xml ' ); ``` For most cases, we could just set the `warehouse` instead of specifying a hive-conf-dir as Ryan suggest: ```sql CREATE CATALOG hive_catalog WITH ( 'type'='iceberg', 'catalog-type'='hive', 'uri'='thrift://localhost:9083', 'clients'='5', 'property-version'='1', 'warehouse'='hdfs://data-dir ' ); ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
