+1 for supporting Hadoop 3. I'm not familiar with the shading efforts, thus no comment on dropping the flink-shaded-hadoop.
Correct me if I'm wrong. Despite currently the default Hadoop version for compiling is 2.4.1 in Flink, I think this does not mean Flink should support only Hadoop 2.4+. So no matter which Hadoop version we use for compiling by default, we need to use reflection for the Hadoop features/APIs that are not supported in all versions anyway. There're already many such reflections in `YarnClusterDescriptor` and `YarnResourceManager`, and might be more in future. I'm wondering whether we should have a unified mechanism (an interface / abstract class or so) that handles all these kind of Hadoop API reflections at one place. Not necessarily in the scope to this discussion though. Thank you~ Xintong Song On Wed, Apr 22, 2020 at 8:32 PM Chesnay Schepler <ches...@apache.org> wrote: > 1) Likely not, as this again introduces a hard-dependency on > flink-shaded-hadoop. > 2) Indeed; this will be something the user/cloud providers have to deal > with now. > 3) Yes. > > As a small note, we can still keep the hadoop-2 version of flink-shaded > around for existing users. > What I suggested was to just not release hadoop-3 versions. > > On 22/04/2020 14:19, Yang Wang wrote: > > Thanks Robert for starting this significant discussion. > > > > Since hadoop3 has been released for long time and many companies have > > already > > put it in production. No matter you are using flink-shaded-hadoop2 or > not, > > currently > > Flink could already run in yarn3(not sure about HDFS). Since the yarn api > > is always > > backward compatible. The difference is we could not benefit from the new > > features > > because we are using hadoop-2.4 as compile dependency. So then we need to > > use > > reflector for new features(node label, tags, etc.). > > > > All in all, i am in in favour of dropping the flink-shaded-hadoop. Just > > have some questions. > > 1. Do we still support "-include-hadoop" profile? If yes, what we will > get > > in the lib dir? > > 2. I am not sure whether dropping the flink-shaded-hadoop will take some > > class conflicts > > problems. If we use "export HADOOP_CLASSPATH=`hadoop classpath`" for the > > hadoop > > env setup, then many jars will be appended to the Flink client classpath. > > 3. The compile hadoop version is still 2.4.1. Right? > > > > > > Best, > > Yang > > > > > > Sivaprasanna <sivaprasanna...@gmail.com> 于2020年4月22日周三 下午4:18写道: > > > >> I agree with Aljoscha. Otherwise I can see a lot of tickets getting > created > >> saying the application is not running on YARN. > >> > >> Cheers, > >> Sivaprasanna > >> > >> On Wed, Apr 22, 2020 at 1:00 PM Aljoscha Krettek <aljos...@apache.org> > >> wrote: > >> > >>> +1 to getting rid of flink-shaded-hadoop. But we need to document how > >>> people can now get a Flink dist that works with Hadoop. Currently, when > >>> you download the single shaded jar you immediately get support for > >>> submitting to YARN via bin/flink run. > >>> > >>> Aljoscha > >>> > >>> > >>> On 22.04.20 09:08, Till Rohrmann wrote: > >>>> Hi Robert, > >>>> > >>>> I think it would be a helpful simplification of Flink's build setup if > >> we > >>>> can get rid of flink-shaded-hadoop. Moreover relying only on the > >> vanilla > >>>> Hadoop dependencies for the modules which interact with Hadoop/Yarn > >>> sounds > >>>> like a good idea to me. > >>>> > >>>> Adding support for Hadoop 3 would also be nice. I'm not sure, though, > >> how > >>>> Hadoop's API's have changed between 2 and 3. It might be necessary to > >>>> introduce some bridges in order to make it work. > >>>> > >>>> Cheers, > >>>> Till > >>>> > >>>> On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <rmetz...@apache.org> > >>> wrote: > >>>>> Hi all, > >>>>> > >>>>> for the upcoming 1.11 release, I started looking into adding support > >> for > >>>>> Hadoop 3[1] for Flink. I have explored a little bit already into > >> adding > >>> a > >>>>> shaded hadoop 3 into “flink-shaded”, and some mechanisms for > switching > >>>>> between Hadoop 2 and 3 dependencies in the Flink build. > >>>>> > >>>>> However, Chesnay made me aware that we could also go a different > >> route: > >>> We > >>>>> let Flink depend on vanilla Hadoop dependencies and stop providing > >>> shaded > >>>>> fat jars for Hadoop through “flink-shaded”. > >>>>> > >>>>> Why? > >>>>> - Maintaining properly shaded Hadoop fat jars is a lot of work (we > >> have > >>>>> insufficient test coverage for all kinds of Hadoop features) > >>>>> - For Hadoop 2, there are already some known and unresolved issues > >> with > >>> our > >>>>> shaded jars that we didn’t manage to fix > >>>>> > >>>>> Users will have to use Flink with Hadoop by relying on vanilla or > >>>>> vendor-provided Hadoop dependencies. > >>>>> > >>>>> What do you think? > >>>>> > >>>>> Best, > >>>>> Robert > >>>>> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-11086 > >>>>> > >>> > >