Hi all,

I've been working on getting Hadoop to build on Windows for quite some time
now. We're now at a stage where we can parallelize the effort and complete
this sooner. I've outlined the parts that are remaining. Please get in
touch with me if anyone wishes to join hands in realizing this goal.

*Why do we need Hadoop to run on Windows?*
Windows has a very large user base. The modern alternative softwares to
Hadoop (like Kubernetes) are cross platform by design. We have to
acknowledge the fact it isn't easy to get Hadoop running on Windows. The
reason why we haven't seen much adoption of Hadoop on Windows is probably
because of issues like compilation, requiring work-arounds every step of
the way etc. If we were to nail these issues, I believe it would
tremendously expand the usage of Hadoop.

I plan to complete this in 4 phases.

*Phase 1 : Building Hadoop on Windows*
1. [HADOOP-17193] Compile Hadoop on Windows natively - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HADOOP-17193>
The Hadoop build on Windows is currently broken because of the POSIX API
calls made in the HDFS native client (libhdfspp). MinGW and Cygwin
provide POSIX implementation on Windows. While it's possible to use these
C++ compilers, it won't be the same as compiling Hadoop with Visual C++.
The Visual C++ runtime is the native C++ runtime on Windows and provides
much more capabilities (like core dumps etc.) than its alternatives. Thus,
it's essential to get Hadoop to compile with Visual Studio on Windows.
We'll be using Visual Studio 2019.

2. [HDFS-15843] [libhdfs++] Make write cross platform - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/HDFS-15843>
Until recently, Hadoop was being built with C++11. I upgraded the compiler
version to a level where it supports C++17 so that we've access to
std::filesystem and a few other modern C++ APIs. However, there are some
cases where the C++17 APIs don't suffice. Thus, I wrote the XPlatform
library
<https://github.com/apache/hadoop/tree/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform>,
which is a collection of system call APIs implemented in a cross-platform
friendly manner. The CMake build system will choose the appropriate
platform implementation while building so that we can do away with all the
#ifdefs based on platform in the code. In summary, if you ever come across
a need to use system calls, please put them into the XPlatform library and
use its APIs.

3. [HDFS-16474] Make HDFS tail tool cross platform - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-16474>
    [HDFS-16473] Make HDFS stat tool cross platform - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-16473>
    [HDFS-16472] Make HDFS setrep tool cross platform - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/HDFS-16472>
    [HDFS-16471] Make HDFS ls tool cross platform - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-16471>
    [HDFS-16470] Make HDFS find tool cross platform - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-16470>
The HDFS native client tools use getopt API to parse the command line
arguments. getopt isn't available on Windows. One can follow this PR to
make the above tools cross platform compatible - HDFS-16285. Make HDFS
ownership tools cross platform by GauthamBanasandra · Pull Request #3588 ·
apache/hadoop (github.com) <https://github.com/apache/hadoop/pull/3588>.

4. [HDFS-16463] Make dirent.h cross platform compatible - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/HDFS-16463>
    [HDFS-16465] Make usage of strings.h cross platform compatible - ASF
JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16465>
For these JIRAs, the header files aren't there for Windows. Thus, we need
to inspect the APIs that have been used from these headers and implement
them.

5. [HDFS-16464] Create only libhdfspp static libraries for Windows - ASF
JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16464>
There are some issues with producing Hadoop dlls on Windows. So, let's plan
to just deliver only static libraries in this phase.

6. [HDFS-16466] Implement Linux permission flags on Windows - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/HDFS-16466>
7. [HDFS-16467] Ensure Protobuf generated headers are included first - ASF
JIRA (apache.org) <https://issues.apache.org/jira/browse/HDFS-16467>
8. [HDFS-16468] Define ssize_t for Windows - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HDFS-16468>
9. [HDFS-16469] Locate protoc-gen-hrpc.exe on Windows - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/HDFS-16469>
10. [YARN-11078] Set env vars in a cross platform compatible way - ASF JIRA
(apache.org) <https://issues.apache.org/jira/browse/YARN-11078>

*Phase 2 : Setup CI for Hadoop on Windows*
1. [HADOOP-18133] Add Dockerfile for Windows 10 - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HADOOP-18133>
2. [HADOOP-18134] Run CI for Windows 10 - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HADOOP-18134>
We really must setup the CI for Hadoop on Windows to ensure that this never
breaks again.

*Phase 3 : Resolving systemic issues*
1. [HADOOP-13223] winutils.exe is a bug nexus and should be killed with an
axe. - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HADOOP-13223>
The Hadoop environment is modeled closer to that of Linux than Windows.
Thus, we see a lot of functional gaps between running Hadoop on Linux v/s
Windows, which have become the source of bugs when it comes to running
Hadoop on Windows. One such issue is that of winutils.exe. We can aim to
address issues like these in this phase. I plan to provide JNI
implementation for each platform and unify these under a common file system
interface. So that we get stack traces for exceptions thrown in these
layers and mostly so that we don't have any disparity between the platforms.

*Phase 4 : Produce Windows distribution of Hadoop*
1. [HADOOP-18135] Produce Windows binaries of Hadoop - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/HADOOP-18135>
The public should be able to download and install Hadoop on their Windows
computers.

Thanks,
--Gautham

Reply via email to