[
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081621#comment-14081621
]
Chris Nauroth commented on HDFS-573:
------------------------------------
I think there are 2 aspects to the question:
# libhdfs embeds a JVM. The JVM itself always runs multiple internal threads,
even if your libhdfs application code doesn't run multiple threads. This means
that by extension, a libhdfs application is always multi-threaded, even if the
application's code is entirely single-threaded/synchronous. This rules out
things like linking to a single-threaded C runtime library for a supposed
performance boost with single-core execution. A libhdfs application must
always link to a C runtime library with multi-threading support.
# As far as the data structures inside the libhdfs code itself, you're correct
that there is no thread safety concern if the application runs entirely
single-threaded and makes synchronous calls. Technically, we don't need a lock
around the hash table in that case. However, it might just cause end user
confusion if we publish thread-safe vs. non-thread-safe builds or some kind of
configuration flag to skip the locking. The effects of running multiple
threads without the locking would be catastrophic, probably a crash of some
sort. I haven't personally seen contention on this lock cause a real-world
performance bottleneck, so I wonder if such an optimization is necessary.
For the scope of this patch, I'd prefer to focus on a straight-up port of the
existing code to work on Windows. We're taking a big step here, moving from
not even compiling on Windows to fully functional, and the patch is already
pretty large. :-) Potential performance enhancements certainly are welcome in
separate patches.
FWIW, I think libhdfs has a weakness in that it has no clear-cut "initialize"
function for the application to call during a single-threaded bootstrap
sequence. This would have given us an easy place to start the {{JavaVM}} and
pre-populate the mapping of class names to class references. Unfortunately, it
would be backwards-incompatible to add that function now and demand existing
applications change their code to call our initialize function. Instead, we
have no choice but to do lazy initialization, and that drives a lot of the
complexity in libhdfs with the mutexes and the thread-local storage. From my
very quick scan of the HADOOP-10388 branch, it looks like we'll be providing a
clearer initialization sequence there. libhdfs likely will need to remain this
way though.
> Porting libhdfs to Windows
> --------------------------
>
> Key: HDFS-573
> URL: https://issues.apache.org/jira/browse/HDFS-573
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: libhdfs
> Environment: Windows, Visual Studio 2008
> Reporter: Ziliang Guo
> Assignee: Chris Nauroth
> Attachments: HDFS-573.1.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex
> locks. To compile it using Visual Studio would require a conversion of the
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of
> the POSIX functions. The code also uses the stdint.h header, which is not
> part of the original C89, but there exists what appears to be a BSD licensed
> reimplementation written to be compatible with MSVC floating around. I have
> already done the other necessary conversions, as well as created a simplistic
> hash bucket for use with hcreate and hsearch and successfully built a DLL of
> libhdfs. Further testing is needed to see if it is usable by other programs
> to actually access hdfs, which will likely happen in the next few weeks as
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and
> also fixed an incorrect array initialization where someone was attempting to
> initialize with something like this: JavaVMOption options[noArgs]; where
> noArgs was being incremented in the code above. This was in the
> hdfsJniHelper.c file, in the getJNIEnv function.
--
This message was sent by Atlassian JIRA
(v6.2#6252)