Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

Xintong Song Wed, 22 Apr 2020 19:06:25 -0700

+1 for supporting Hadoop 3.

I'm not familiar with the shading efforts, thus no comment on dropping the
flink-shaded-hadoop.



Correct me if I'm wrong. Despite currently the default Hadoop version for
compiling is 2.4.1 in Flink, I think this does not mean Flink should
support only Hadoop 2.4+. So no matter which Hadoop version we use for
compiling by default, we need to use reflection for the Hadoop
features/APIs that are not supported in all versions anyway.


There're already many such reflections in `YarnClusterDescriptor` and
`YarnResourceManager`, and might be more in future. I'm wondering whether
we should have a unified mechanism (an interface / abstract class or so)
that handles all these kind of Hadoop API reflections at one place. Not
necessarily in the scope to this discussion though.


Thank you~

Xintong Song



On Wed, Apr 22, 2020 at 8:32 PM Chesnay Schepler <[email protected]> wrote:

> 1) Likely not, as this again introduces a hard-dependency on
> flink-shaded-hadoop.
> 2) Indeed; this will be something the user/cloud providers have to deal
> with now.
> 3) Yes.
>
> As a small note, we can still keep the hadoop-2 version of flink-shaded
> around for existing users.
> What I suggested was to just not release hadoop-3 versions.
>
> On 22/04/2020 14:19, Yang Wang wrote:
> > Thanks Robert for starting this significant discussion.
> >
> > Since hadoop3 has been released for long time and many companies have
> > already
> > put it in production. No matter you are using flink-shaded-hadoop2 or
> not,
> > currently
> > Flink could already run in yarn3(not sure about HDFS). Since the yarn api
> > is always
> > backward compatible. The difference is we could not benefit from the new
> > features
> > because we are using hadoop-2.4 as compile dependency. So then we need to
> > use
> > reflector for new features(node label, tags, etc.).
> >
> > All in all, i am in in favour of dropping the flink-shaded-hadoop. Just
> > have some questions.
> > 1. Do we still support "-include-hadoop" profile? If yes, what we will
> get
> > in the lib dir?
> > 2. I am not sure whether dropping the flink-shaded-hadoop will take some
> > class conflicts
> > problems. If we use "export HADOOP_CLASSPATH=`hadoop classpath`" for the
> > hadoop
> > env setup, then many jars will be appended to the Flink client classpath.
> > 3. The compile hadoop version is still 2.4.1. Right?
> >
> >
> > Best,
> > Yang
> >
> >
> > Sivaprasanna <[email protected]> 于2020年4月22日周三 下午4:18写道：
> >
> >> I agree with Aljoscha. Otherwise I can see a lot of tickets getting
> created
> >> saying the application is not running on YARN.
> >>
> >> Cheers,
> >> Sivaprasanna
> >>
> >> On Wed, Apr 22, 2020 at 1:00 PM Aljoscha Krettek <[email protected]>
> >> wrote:
> >>
> >>> +1 to getting rid of flink-shaded-hadoop. But we need to document how
> >>> people can now get a Flink dist that works with Hadoop. Currently, when
> >>> you download the single shaded jar you immediately get support for
> >>> submitting to YARN via bin/flink run.
> >>>
> >>> Aljoscha
> >>>
> >>>
> >>> On 22.04.20 09:08, Till Rohrmann wrote:
> >>>> Hi Robert,
> >>>>
> >>>> I think it would be a helpful simplification of Flink's build setup if
> >> we
> >>>> can get rid of flink-shaded-hadoop. Moreover relying only on the
> >> vanilla
> >>>> Hadoop dependencies for the modules which interact with Hadoop/Yarn
> >>> sounds
> >>>> like a good idea to me.
> >>>>
> >>>> Adding support for Hadoop 3 would also be nice. I'm not sure, though,
> >> how
> >>>> Hadoop's API's have changed between 2 and 3. It might be necessary to
> >>>> introduce some bridges in order to make it work.
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <[email protected]>
> >>> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> for the upcoming 1.11 release, I started looking into adding support
> >> for
> >>>>> Hadoop 3[1] for Flink. I have explored a little bit already into
> >> adding
> >>> a
> >>>>> shaded hadoop 3 into “flink-shaded”, and some mechanisms for
> switching
> >>>>> between Hadoop 2 and 3 dependencies in the Flink build.
> >>>>>
> >>>>> However, Chesnay made me aware that we could also go a different
> >> route:
> >>> We
> >>>>> let Flink depend on vanilla Hadoop dependencies and stop providing
> >>> shaded
> >>>>> fat jars for Hadoop through “flink-shaded”.
> >>>>>
> >>>>> Why?
> >>>>> - Maintaining properly shaded Hadoop fat jars is a lot of work (we
> >> have
> >>>>> insufficient test coverage for all kinds of Hadoop features)
> >>>>> - For Hadoop 2, there are already some known and unresolved issues
> >> with
> >>> our
> >>>>> shaded jars that we didn’t manage to fix
> >>>>>
> >>>>> Users will have to use Flink with Hadoop by relying on vanilla or
> >>>>> vendor-provided Hadoop dependencies.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Best,
> >>>>> Robert
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-11086
> >>>>>
> >>>
>
>

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

Reply via email to