Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

Yang Wang Wed, 22 Apr 2020 05:20:14 -0700

Thanks Robert for starting this significant discussion.

Since hadoop3 has been released for long time and many companies have
already
put it in production. No matter you are using flink-shaded-hadoop2 or not,
currently
Flink could already run in yarn3(not sure about HDFS). Since the yarn api
is always
backward compatible. The difference is we could not benefit from the new
features
because we are using hadoop-2.4 as compile dependency. So then we need to
use
reflector for new features(node label, tags, etc.).


All in all, i am in in favour of dropping the flink-shaded-hadoop. Just
have some questions.
1. Do we still support "-include-hadoop" profile? If yes, what we will get
in the lib dir?
2. I am not sure whether dropping the flink-shaded-hadoop will take some
class conflicts
problems. If we use "export HADOOP_CLASSPATH=`hadoop classpath`" for the
hadoop
env setup, then many jars will be appended to the Flink client classpath.
3. The compile hadoop version is still 2.4.1. Right?


Best,
Yang


Sivaprasanna <sivaprasanna...@gmail.com> 于2020年4月22日周三 下午4:18写道：

> I agree with Aljoscha. Otherwise I can see a lot of tickets getting created
> saying the application is not running on YARN.
>
> Cheers,
> Sivaprasanna
>
> On Wed, Apr 22, 2020 at 1:00 PM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > +1 to getting rid of flink-shaded-hadoop. But we need to document how
> > people can now get a Flink dist that works with Hadoop. Currently, when
> > you download the single shaded jar you immediately get support for
> > submitting to YARN via bin/flink run.
> >
> > Aljoscha
> >
> >
> > On 22.04.20 09:08, Till Rohrmann wrote:
> > > Hi Robert,
> > >
> > > I think it would be a helpful simplification of Flink's build setup if
> we
> > > can get rid of flink-shaded-hadoop. Moreover relying only on the
> vanilla
> > > Hadoop dependencies for the modules which interact with Hadoop/Yarn
> > sounds
> > > like a good idea to me.
> > >
> > > Adding support for Hadoop 3 would also be nice. I'm not sure, though,
> how
> > > Hadoop's API's have changed between 2 and 3. It might be necessary to
> > > introduce some bridges in order to make it work.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <rmetz...@apache.org>
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> for the upcoming 1.11 release, I started looking into adding support
> for
> > >> Hadoop 3[1] for Flink. I have explored a little bit already into
> adding
> > a
> > >> shaded hadoop 3 into “flink-shaded”, and some mechanisms for switching
> > >> between Hadoop 2 and 3 dependencies in the Flink build.
> > >>
> > >> However, Chesnay made me aware that we could also go a different
> route:
> > We
> > >> let Flink depend on vanilla Hadoop dependencies and stop providing
> > shaded
> > >> fat jars for Hadoop through “flink-shaded”.
> > >>
> > >> Why?
> > >> - Maintaining properly shaded Hadoop fat jars is a lot of work (we
> have
> > >> insufficient test coverage for all kinds of Hadoop features)
> > >> - For Hadoop 2, there are already some known and unresolved issues
> with
> > our
> > >> shaded jars that we didn’t manage to fix
> > >>
> > >> Users will have to use Flink with Hadoop by relying on vanilla or
> > >> vendor-provided Hadoop dependencies.
> > >>
> > >> What do you think?
> > >>
> > >> Best,
> > >> Robert
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-11086
> > >>
> > >
> >
> >
>

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

Reply via email to