john lilley created HADOOP-13223:
------------------------------------
Summary: winutils.exe is an abomination and should be killed with
an axe.
Key: HADOOP-13223
URL: https://issues.apache.org/jira/browse/HADOOP-13223
Project: Hadoop Common
Issue Type: Improvement
Components: bin
Affects Versions: 2.6.0
Environment: Microsoft Windows, all versions
Reporter: john lilley
winutils.exe was apparently created as a stopgap measure to allow Hadoop to
"work" on Windows platforms, because the NativeIO libraries aren't implemented
there. Rather than building a DLL that makes native OS calls, the creators of
winutils.exe must have decided that it would be more expedient to create an EXE
to carry out file system operations in a linux-like fashion. Unfortunately,
like many stopgap measures in software, this one has persisted well beyond its
expected lifetime and usefulness. My team creates software that runs on
Windows and Linux, and winutils.exe is probably responsible for 20% of all
issues we encounter, both during development and in the field.
Problem #1 with winutils.exe is that it is simply missing from many popular
distros and/or the client-side software installation for said distros, when
supplied, fails to install winutils.exe. Thus, as software developers, we are
forced to pick one version and distribute and install it with our software.
Which leads to problem #2: winutils.exe are not always compatible. In
particular, MapR MUST have its winutils.exe in the system path, but doing so
breaks the Hadoop distro for every other Hadoop vendor. This makes creating
and maintaining test environments that work with all of the Hadoop distros we
want to test unnecessarily tedious and error-prone.
Problem #3 is that the mechanism by which you inform the Hadoop client software
where to find winutils.exe is poorly documented and fragile. First, it can be
in the PATH. If it is in the PATH, that is where it is found. However, the
documentation, such as it is, makes no mention of this, and instead says that
you should set the HADOOP_HOME environment variable, which does NOT override
the winutils.exe found in your system PATH.
Which leads to problem #4: There is no logging that says where winutils.exe was
actually found and loaded. Because of this, fixing problems of finding the
wrong winutils.exe are extremely difficult.
Problem #5 is that most of the time, such as when accessing straight up HDFS
and YARN, one does not *need* winutils.exe. But if it is missing, the log
messages complain about its absence. When we are trying to diagnose an obscure
issue in Hadoop (of which there are many), the presence of this red herring
leads to all sorts of time wasted until someone on the team points out that
winutils.exe is not the problem, at least not this time.
Problem #6 is that errors and stack traces from issues involving winutils.exe
are not helpful. The Java stack trace ends at the ProcessBuilder call. Only
through bitter experience is one able to connect the dots from "ProcessBuilder
is the last thing on the stack" to "something is wrong with winutils.exe".
Note that none of these involve running Hadoop on Windows. They are only
encountered when using Hadoop client libraries to access a cluster from Windows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]