openinx commented on pull request #1558:
URL: https://github.com/apache/iceberg/pull/1558#issuecomment-707097927


   > Can you help me understand Flink's behavior a bit more?
   
   @rdblue , you may want to read this flink 
[document](https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/#application-mode).
   There's a `application mode` in flink,  means  users could just upload the 
`iceberg-flink-runtime.jar` to HDFS, and then start the command like the 
following: 
   
   ```
   bin/flink run-application -p 5 \
   -d \
   -t yarn-application \
   -yD yarn.provided.lib.dirs="hdfs://hadoopcluster/data/flink/libs/" \
   hdfs://hadoopcluster/data/flink/user-lib/flink-kafka-iceberg.jar
   ```
   
   it will bootstrap a separate flink cluster on yarn for the current 
submission, and the started flink jobmanager will execute the `main()` entry 
from the user-provided jar.  In that case,  the classpath or environment  
inside yarn container should not be  modifiable by the user-provided jar. 
   
   Providing a flink  config key to indicate the hive-site URI  sounds good to 
me.  then we could submit the flink job in application mode like following: 
   
   ```
   bin/flink run-application -p 5 \
   -d \
   -t yarn-application \
   -D iceberg.hive-site.path=hdfs://config/hive-site.xml \
   -yD yarn.provided.lib.dirs="hdfs://hadoopcluster/data/flink/libs/" \
   hdfs://hadoopcluster/data/flink/user-lib/flink-kafka-iceberg.jar
   ```
   
   Above we are talking about a Flink Datastream job. For Flink SQL job,  we 
could do the similar thing like `HiveCatalog` in flink: 
   
   ```sql
   CREATE CATALOG hive_catalog WITH (
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://localhost:9083',
     'clients'='5',
     'property-version'='1',
     'hive-conf-dir'='hdfs://config/hive-site.xml '
   );
   ```
   
   For most cases,  we could just set the `warehouse` instead of specifying a 
hive-conf-dir as Ryan suggest: 
   
   ```sql
   CREATE CATALOG hive_catalog WITH (
     'type'='iceberg',
     'catalog-type'='hive',
     'uri'='thrift://localhost:9083',
     'clients'='5',
     'property-version'='1',
     'warehouse'='hdfs://data-dir '
   );
   ```
   
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to