[
https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319400#comment-15319400
]
john lilley commented on HADOOP-13223:
--------------------------------------
[~cmccabe], Looking at the various issues we've encountered, I agree that most
of them can be addressed with keeping winutils.exe and doing these things:
1: Taking steps to ensure that winutils.exe is always available on client
library downloads IN A CONSISTENT PLACE
2: #1 can be made automatic by bundling winutils.exe into the
RawLocalFileSystem jar (or perhaps NativeIO?) and caching it to a temporary
place before invoking it.
3: Removing HADOOP_HOME, hadoop.home.dir, and PATH as alternate ways of finding
winutils.exe. If #2 is done, this should always yield a full path to exactly
the winutils.exe that we want.
4: Hiding all access to winutils under a consistent API (in RawLocalFileSystem
or NativeIO) for performing file operations (chown, chmod, symlink, readlink,
etc). This means removing or privatizing almost everything in the Shell class,
but especially the following: Shell.getWinUtilsPath(), Shell.WINUTILS,
Shell.get*Command().
> winutils.exe is a bug nexus and should be killed with an axe.
> -------------------------------------------------------------
>
> Key: HADOOP-13223
> URL: https://issues.apache.org/jira/browse/HADOOP-13223
> Project: Hadoop Common
> Issue Type: Improvement
> Components: bin
> Affects Versions: 2.6.0
> Environment: Microsoft Windows, all versions
> Reporter: john lilley
>
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to
> "work" on Windows platforms, because the NativeIO libraries aren't
> implemented there (edit: even NativeIO probably doesn't cover the operations
> that winutils.exe is used for). Rather than building a DLL that makes native
> OS calls, the creators of winutils.exe must have decided that it would be
> more expedient to create an EXE to carry out file system operations in a
> linux-like fashion. Unfortunately, like many stopgap measures in software,
> this one has persisted well beyond its expected lifetime and usefulness. My
> team creates software that runs on Windows and Linux, and winutils.exe is
> probably responsible for 20% of all issues we encounter, both during
> development and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular
> distros and/or the client-side software installation for said distros, when
> supplied, fails to install winutils.exe. Thus, as software developers, we
> are forced to pick one version and distribute and install it with our
> software.
> Which leads to problem #2: winutils.exe are not always compatible. In
> particular, MapR MUST have its winutils.exe in the system path, but doing so
> breaks the Hadoop distro for every other Hadoop vendor. This makes creating
> and maintaining test environments that work with all of the Hadoop distros we
> want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client
> software where to find winutils.exe is poorly documented and fragile. First,
> it can be in the PATH. If it is in the PATH, that is where it is found.
> However, the documentation, such as it is, makes no mention of this, and
> instead says that you should set the HADOOP_HOME environment variable, which
> does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe
> was actually found and loaded. Because of this, fixing problems of finding
> the wrong winutils.exe are extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS
> and YARN, one does not *need* winutils.exe. But if it is missing, the log
> messages complain about its absence. When we are trying to diagnose an
> obscure issue in Hadoop (of which there are many), the presence of this red
> herring leads to all sorts of time wasted until someone on the team points
> out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe
> are not helpful. The Java stack trace ends at the ProcessBuilder call. Only
> through bitter experience is one able to connect the dots from
> "ProcessBuilder is the last thing on the stack" to "something is wrong with
> winutils.exe".
> Note that none of these involve running Hadoop on Windows. They are only
> encountered when using Hadoop client libraries to access a cluster from
> Windows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]