you up for doing a winutils build on the 3.3.5 release? i'm going to do the
arm64 binaries

On Mon, 14 Nov 2022 at 18:08, Gautham Banasandra <gaur...@apache.org> wrote:

> Also, I plan to do a Windows release once I setup the CI for Windows and
> after I get
> the major unit tests to pass. It would still contain winutils though.
> However, we can do
> another release after deprecating winutils.
>
> Thanks,
> --Gautham
>
> On Mon, 14 Nov 2022 at 23:34, Gautham Banasandra <gaur...@apache.org>
> wrote:
>
> > Hi Iñigo,
> >
> > I would like to aim for winutils deprecation by the end of the first
> > quarter of 2023.
> > It really depends on how fast I can wrap up with setting up CI for
> > Windows. Given
> > that this involves getting Yetus to work properly on Windows, I feel it's
> > a bit
> > ambitious. But if things fall into place, I think end of the first
> quarter
> > of 2023 would
> > be a reachable timeline.
> >
> > Thanks,
> > --Gautham
> >
> > On Sat, 12 Nov 2022 at 00:20, Iñigo Goiri <elgo...@gmail.com> wrote:
> >
> >> Gautham, thank you very much for the summary.
> >> Do you have a time-line for when we can get rid of winutils?
> >> My idea was to get this and the YARN federation hardening work into a
> 3.4
> >> release.
> >>
> >>
> >>
> >> On Fri, Nov 11, 2022, 10:15 Gautham Banasandra <gaur...@apache.org>
> >> wrote:
> >>
> >>> Hi folks,
> >>>
> >>>
> >>> What have we done so far?
> >>> ------------------------------------
> >>> Inigo and I have been working for quite some time now on this topic,
> >>> but our efforts have mostly been oriented towards making Hadoop
> >>> cross-platform compatible. Our focus has been on streamlining the
> >>> process of building Hadoop on Windows so that one can easily
> >>> build and run Hadoop, just like on Linux. We reached this milestone
> >>> quite recently and I've documented the steps for doing so here -
> >>>
> >>>
> https://github.com/apache/hadoop/blob/5bb11cecea136acccac2563b37021b554e517012/BUILDING.txt#L493-L622
> >>>
> >>>
> >>>
> >>> Is winutils still required?
> >>> -------------------------------
> >>> As Steve mentioned, we would still require winutils for running
> >>> Hadoop on Windows. The major change here is that winutils
> >>> need not come from a third-party repository anymore, rather it
> >>> gets built along with the Hadoop codebase itself henceforth.
> >>> However, I agree that we need to deprecate winutils and
> >>> replace it with something better so that Hadoop users can have
> >>> a smoother experience.
> >>>
> >>>
> >>> What's the best way to deprecate winutils?
> >>> --------------------------------------------------------
> >>> Over all the time that I've spent making Hadoop cross-platform
> >>> compatible, I've realized that the best way would be to have a
> >>> JNI interface that wraps around a native layer. This native layer
> >>> could be implemented majorly in C++. C++17 provides the
> >>> std::filesystem namespace that can satisfy most of the native
> >>> filesystem API requirements. Since std::filesystem is part of "The
> >>> Standard Libray", these APIs will be present on most/all the C++
> >>> compilers of the various OS platforms. For those parts that can't
> >>> be satisfied by std::filesystem, we'll have to delve into this part
> >>> by writing C code that makes system calls. Please note that
> >>> these C files will need to be implemented specifically for each
> >>> platform. I took this approach when I wrote x-platform library to
> >>> make HDFS native client cross-platform compatible -
> >>>
> >>>
> https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform
> >>>
> >>>
> >>> What am I focussing on currently?
> >>> ------------------------------------------------
> >>> So far, I've focussed on getting the build to work seamlessly
> >>> on Windows. I'm now trying to protect this from breaking by
> >>> setting up CI on Jenkins that builds Hadoop on Windows
> >>> for the precommit validation -
> >>> https://issues.apache.org/jira/browse/INFRA-23809
> >>> Yes, it does involve getting
> >>> Yetus to run on Windows. I can work on deprecating winutils
> >>> after this.
> >>>
> >>> Thanks,
> >>> --Gautham
> >>>
> >>> On Fri, 11 Nov 2022 at 19:51, Steve Loughran
> <ste...@cloudera.com.invalid>
> >>> wrote:
> >>>
> >>>> It's time to reach for the axe.
> >>>>
> >>>> We haven't shipped eight version of Apache hadoop which builds and
> runs
> >>>> on
> >>>> windows for a long long time. I the only people trying to use the
> >>>> library
> >>>> is on windows Will have been people trying to use spark on their
> laptops
> >>>> with "small" dataset of only a are few tens of gigabytes at a time,
> the
> >>>> kind of work where 32GB of ram and 16 cores is enough. Put
> differently:
> >>>> in
> >>>> storage and performance of Single laptop means that it is perfectly
> >>>> suitable for doing reasonable amounts of work and the main barrier to
> >>>> doing
> >>>> so is getting a copy of the winutils lib.
> >>>>
> >>>> I know Gautham and Inigo I trying to get windows to work as a location
> >>>> for
> >>>> yarn again; not sure about hdfs. And there, yes, we have to say "they
> >>>> likely to need an extra binary"
> >>>>
> >>>> But for someone wanting to count the number of rows in an avro file?
> do
> >>>> a
> >>>> simple bit of filtering on some parquet data? Is these are the kind of
> >>>> things that anyone with a linux/mac laptop can do with ease and it is
> >>>> not
> >>>> fair to put suffering on to others. And well we could just say "why do
> >>>> you
> >>>> just install Lynnox on that laptop then?", I have someone who has had
> a
> >>>> Linux laptop for many years I know the written strong arguments
> against
> >>>> it
> >>>> even beyond the "my employer demand windows with their IT software" as
> >>>> "a
> >>>> latop which comes out of sleep reliably" is kind of important too.
> >>>>
> >>>> I how can we let the people who have to live in this world – And we
> have
> >>>> someone who is clearly willing to help –Live a better life. Funnily
> >>>> enough,
> >>>> the fact that we have not shipped a working version of when you tails
> >>>> for a
> >>>> long time actually gives us an advantage: we can pick incompatible
> >>>> changes
> >>>> and be confident that most people aren't going to notice.
> >>>>
> >>>> I think a good first step would be for Shell to work well if winutils
> >>>> isn't
> >>>> around -get rid of that static, WINUTILS string and path/file
> >>>> equivalents,
> >>>> the ones deprecated in 2015. We can rip them out knowing no external
> >>>> code
> >>>> is using them.
> >>>>
> >>>> Then we should look very closely at FileUtil to see how much of that
> is
> >>>> needed and how can we isolate it better. If you look at the change log
> >>>> of
> >>>> that file, we do have to consider that every time it execs a shell
> >>>> command
> >>>> I there's a security risk and more than once we've had to fix it. Not
> >>>> executing any external shell commands is good everywhere.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, 10 Nov 2022 at 19:00, Chris Nauroth <cnaur...@apache.org>
> >>>> wrote:
> >>>>
> >>>> > Symlink support on the local file system is still used. One example
> I
> >>>> can
> >>>> > think of is YARN container launch [1].
> >>>> >
> >>>> > I would welcome removal of winutils, as already described in various
> >>>> JIRA
> >>>> > issues. I think the biggest challenge we'll have is testing of a
> >>>> transition
> >>>> > from winutils to the newer Java APIs. The contract tests help, but
> >>>> > historically there was also a tendency to break things in downstream
> >>>> > dependent projects.
> >>>> >
> >>>> > I'd suggest taking this on piecemeal, transitioning small pieces of
> >>>> > FileSystem off of winutils one at a time.
> >>>> >
> >>>> > [1]
> >>>> >
> >>>> >
> >>>>
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L1508-L1509
> >>>> >
> >>>> > Chris Nauroth
> >>>> >
> >>>> >
> >>>> > On Thu, Nov 10, 2022 at 10:33 AM Wei-Chiu Chuang <
> weic...@apache.org>
> >>>> > wrote:
> >>>> >
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > >   * Bare Naked Local File System v0.1.0 doesn't (yet) support
> >>>> symlinks
> >>>> > > >     or the sticky bit.
> >>>> > > >
> >>>> > > ok to not support symlinks. The symlinks of HDFS are not being
> >>>> maintained
> >>>> > > and I am not aware of anything relying on it.
> >>>> > > So I assume people don't need it.
> >>>> > >
> >>>> > > Sticky bit would be useful, I guess.
> >>>> > >
> >>>> > > I suppose folks working at Microsoft would be more interested in
> >>>> this
> >>>> > work?
> >>>> > > Last time I heard, Gautham and Inigo were revamping Hadoop's
> Windows
> >>>> > > support.
> >>>> > >
> >>>> > >
> >>>> > > >   * But the bigger issue is how to excise Winutils completely in
> >>>> the
> >>>> > > >     existing Hadoop code. Winutils assumptions are hard-coded at
> >>>> a low
> >>>> > > >     level across various classes—even code that has nothing to
> do
> >>>> with
> >>>> > > >     the file system. The startup configuration for example calls
> >>>> > > >     `StringUtils.equalsIgnoreCase("true", valueString)` which
> >>>> loads the
> >>>> > > >     `StringUtils` class, which has a static reference to
> `Shell`,
> >>>> which
> >>>> > > >     has a static block that checks for `WINUTILS_EXE`.
> >>>> > > >   * For the most part there should no longer even be a need for
> >>>> > anything
> >>>> > > >     but direct Java API access for the local file system. But
> >>>> muddling
> >>>> > > >     things further, the existing `RawLocalFileSystem`
> >>>> implementation
> >>>> > has
> >>>> > > >     /four/ ways to access the local file system: Winutils, JNI
> >>>> calls,
> >>>> > > >     shell access, and a "new" approach using "stat". The "stat"
> >>>> > approach
> >>>> > > >     has been switched off with a hard-coded
> >>>> `useDeprecatedFileStatus =
> >>>> > > >     true` because of HADOOP-9652
> >>>> > > >     <https://issues.apache.org/jira/browse/HADOOP-9652>.
> >>>> > > >   * Local file access is not contained within
> >>>> `RawLocalFileSystem` but
> >>>> > > >     is scattered across other classes; `FileUtil.readLink()` for
> >>>> > example
> >>>> > > >     (which `RawLocalFileSystem` calls because of the deprecation
> >>>> issue
> >>>> > > >     above) uses the shell approach without any option to change
> >>>> it.
> >>>> > > >     (This implementation-specific decision should have been
> >>>> contained
> >>>> > > >     within the `FileSystem` implementation itself.)
> >>>> > > >
> >>>> > > > In short, it's a mess that has accumulated over years and
> getting
> >>>> > worse,
> >>>> > > > charging high interest on what at first was a small,
> >>>> self-contained
> >>>> > > > technical debt.
> >>>> > > >
> >>>> > > > I would welcome the opportunity to clean up this mess. I'm
> >>>> probably as
> >>>> > > > qualified as anyone to make the changes. This is one of my areas
> >>>> of
> >>>> > > > expertise: I was designing a full abstract file system interface
> >>>> (with
> >>>> > > > pure-Java from-scratch implementations for the local file
> system,
> >>>> > > > Subversion, and WebDAV—even the WebDAV HTTP implementation was
> >>>> from
> >>>> > > > scratch) around the time Apache Nutch was getting off the
> ground.
> >>>> Most
> >>>> > > > recently I've worked on the Hadoop `FileSystem` API contracting
> >>>> for
> >>>> > > > LinkedIn, discovering (what I consider to be) a huge bug in
> >>>> > > > ViewFilesystem, HADOOP-18525
> >>>> > > > <https://issues.apache.org/jira/browse/HADOOP-18525>.
> >>>> > > >
> >>>> > > > The cleanup should be done in several stages (e.g. consolidating
> >>>> > > > WinUtils access; replacing code with pure Java API calls;
> >>>> undeprecating
> >>>> > > > the new Stat code and relegating it to a different class, etc.).
> >>>> > > > Unfortunately it's not financially feasible for me to sit here
> for
> >>>> > > > several months and revamp the Hadoop `FileSystem` subsystem for
> >>>> fun
> >>>> > > > (even though I wish I could). Perhaps there is job opening at a
> >>>> company
> >>>> > > > related to Hadoop that would be interested in hiring me and
> >>>> devoting a
> >>>> > > > certain percentage of my time to fixing local `FileSystem`
> >>>> access. If
> >>>> > > > so, let me know where I should send my resume
> >>>> > > > <https://www.garretwilson.com/about/resume>.
> >>>> > > >
> >>>> > > > Otherwise let me know if any ideas for a way forward. If there
> >>>> proves
> >>>> > to
> >>>> > > > be interest in GlobalMentor Hadoop Bare Naked Local FileSystem
> >>>> > > > <https://github.com/globalmentor/hadoop-bare-naked-local-fs> on
> >>>> GitHub
> >>>> > > > I'll try to maintain and improve it, but really what needs to be
> >>>> > > > revamped is the Hadoop codebase itself. I'll be happy when
> Hadoop
> >>>> is
> >>>> > > > fixed so that both Steve's code and my code are no longer
> needed.
> >>>> > > >
> >>>> > > > Garret
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Reply via email to