mgmarino commented on issue #3044:
URL: https://github.com/apache/iceberg/issues/3044#issuecomment-1539560789

   Thanks for the response, @matt-slalom.  I did get something working, mainly 
following the comment 
[here](https://github.com/apache/iceberg/issues/3117#issuecomment-1377037706), 
though converting it to maven.  One important point is that we are using the 
Table/SQL API (also from pyFlink), so are not explicitly instantiating the 
FlinkCatalog.  As such, we are dependent on [this 
code](https://github.com/apache/iceberg/blob/bbe7a64d1c6f5b8b680c74caa9880f9e83a8d754/flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/FlinkDynamicTableFactory.java#L164).
   
   A few things:
   - Important here was the relocation/shading, which, for me not being a java 
programmer, took some time to get.  
   - I still had to include (some) hadoop libraries, but I was able to drop 
`'org.apache.flink:flink-hadoop-fs'` by writing my own 
`HadoopUtils::getHadoopConfiguration`  that returns `new 
Configuration(false);`.  I think it should be possible to "hack" in a 
`Configuration` class to avoid pulling in hadoop libraries, but I haven't dug 
more just yet.
   - One important thing I ran across was that flink does manipulate class 
loading ("child first" vs "parent first") [in 
general](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/#inverted-class-loading-and-classloader-resolution-order),
 but explicitly *does not* do this for hadoop libraries (see 
[here](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-default)).
  This could have been the source of problems that some other posters mentioned 
above.
   
   Anyways, I would still classify this as workaround, but, since I had to 
piece this together, I think it would still somehow make sense to document this 
until the hadoop dependencies are fully removed.  I will try to come back and 
update this comment once I have wrapped things up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to