mgmarino commented on issue #3044: URL: https://github.com/apache/iceberg/issues/3044#issuecomment-1539560789
Thanks for the response, @matt-slalom. I did get something working, mainly following the comment [here](https://github.com/apache/iceberg/issues/3117#issuecomment-1377037706), though converting it to maven. One important point is that we are using the Table/SQL API (also from pyFlink), so are not explicitly instantiating the FlinkCatalog. As such, we are dependent on [this code](https://github.com/apache/iceberg/blob/bbe7a64d1c6f5b8b680c74caa9880f9e83a8d754/flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/FlinkDynamicTableFactory.java#L164). A few things: - Important here was the relocation/shading, which, for me not being a java programmer, took some time to get. - I still had to include (some) hadoop libraries, but I was able to drop `'org.apache.flink:flink-hadoop-fs'` by writing my own `HadoopUtils::getHadoopConfiguration` that returns `new Configuration(false);`. I think it should be possible to "hack" in a `Configuration` class to avoid pulling in hadoop libraries, but I haven't dug more just yet. - One important thing I ran across was that flink does manipulate class loading ("child first" vs "parent first") [in general](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/#inverted-class-loading-and-classloader-resolution-order), but explicitly *does not* do this for hadoop libraries (see [here](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-default)). This could have been the source of problems that some other posters mentioned above. Anyways, I would still classify this as workaround, but, since I had to piece this together, I think it would still somehow make sense to document this until the hadoop dependencies are fully removed. I will try to come back and update this comment once I have wrapped things up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
