Fokko opened a new issue, #7332:
URL: https://github.com/apache/iceberg/issues/7332
### Feature Request / Improvement
Playing around with `pyflink` and noticed that the Hadoop dependency is
required when using the REST catalog:
```python
➜ ~ python3.9
Python 3.9.16 (main, Dec 7 2022, 10:06:04)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>>
>>> from pyflink.datastream import StreamExecutionEnvironment
>>>
>>> env = StreamExecutionEnvironment.get_execution_environment()
>>> iceberg_flink_runtime_jar =
"/Users/fokkodriesprong/Desktop/iceberg/flink/v1.17/flink-runtime/build/libs/iceberg-flink-runtime-1.17-1.3.0-SNAPSHOT.jar"
>>>
>>> env.add_jars("file://{}".format(iceberg_flink_runtime_jar))
>>>
>>> from pyflink.table import StreamTableEnvironment
>>>
>>> table_env = StreamTableEnvironment.create(env)
>>>
>>> table_env.execute_sql("""
... CREATE CATALOG tabular WITH (
... 'type'='iceberg',
... 'catalog-type'='rest',
... 'uri'='https://api.tabular.io/ws',
... 'credential'='t-tcEe4Ihp4eM:pyTlx_4ayKV7N54gXuBmMotVFLU'
... )
... """)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/opt/homebrew/lib/python3.9/site-packages/pyflink/table/table_environment.py",
line 837, in execute_sql
return TableResult(self._j_tenv.executeSql(stmt))
File "/opt/homebrew/lib/python3.9/site-packages/py4j/java_gateway.py",
line 1322, in __call__
return_value = get_return_value(
File
"/opt/homebrew/lib/python3.9/site-packages/pyflink/util/exceptions.py", line
146, in deco
return f(*a, **kw)
File "/opt/homebrew/lib/python3.9/site-packages/py4j/protocol.py", line
326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o23.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at
org.apache.iceberg.flink.FlinkCatalogFactory.clusterHadoopConf(FlinkCatalogFactory.java:211)
at
org.apache.iceberg.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:139)
at
org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:414)
at
org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1466)
at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1212)
at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
at
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
at
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 17 more
```
When using a Hadoop or Hive catalog, this makes perfect sense but would be
nice to make it optional when using the REST catalog.
### Query engine
Flink
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]