+1 to getting rid of flink-shaded-hadoop. But we need to document how people can now get a Flink dist that works with Hadoop. Currently, when you download the single shaded jar you immediately get support for submitting to YARN via bin/flink run.

Aljoscha


On 22.04.20 09:08, Till Rohrmann wrote:
Hi Robert,

I think it would be a helpful simplification of Flink's build setup if we
can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla
Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds
like a good idea to me.

Adding support for Hadoop 3 would also be nice. I'm not sure, though, how
Hadoop's API's have changed between 2 and 3. It might be necessary to
introduce some bridges in order to make it work.

Cheers,
Till

On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <rmetz...@apache.org> wrote:

Hi all,

for the upcoming 1.11 release, I started looking into adding support for
Hadoop 3[1] for Flink. I have explored a little bit already into adding a
shaded hadoop 3 into “flink-shaded”, and some mechanisms for switching
between Hadoop 2 and 3 dependencies in the Flink build.

However, Chesnay made me aware that we could also go a different route: We
let Flink depend on vanilla Hadoop dependencies and stop providing shaded
fat jars for Hadoop through “flink-shaded”.

Why?
- Maintaining properly shaded Hadoop fat jars is a lot of work (we have
insufficient test coverage for all kinds of Hadoop features)
- For Hadoop 2, there are already some known and unresolved issues with our
shaded jars that we didn’t manage to fix

Users will have to use Flink with Hadoop by relying on vanilla or
vendor-provided Hadoop dependencies.

What do you think?

Best,
Robert

[1] https://issues.apache.org/jira/browse/FLINK-11086



Reply via email to