[ 
https://issues.apache.org/jira/browse/HDFS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081621#comment-14081621
 ] 

Chris Nauroth commented on HDFS-573:
------------------------------------

I think there are 2 aspects to the question:

# libhdfs embeds a JVM.  The JVM itself always runs multiple internal threads, 
even if your libhdfs application code doesn't run multiple threads.  This means 
that by extension, a libhdfs application is always multi-threaded, even if the 
application's code is entirely single-threaded/synchronous.  This rules out 
things like linking to a single-threaded C runtime library for a supposed 
performance boost with single-core execution.  A libhdfs application must 
always link to a C runtime library with multi-threading support.
# As far as the data structures inside the libhdfs code itself, you're correct 
that there is no thread safety concern if the application runs entirely 
single-threaded and makes synchronous calls.  Technically, we don't need a lock 
around the hash table in that case.  However, it might just cause end user 
confusion if we publish thread-safe vs. non-thread-safe builds or some kind of 
configuration flag to skip the locking.  The effects of running multiple 
threads without the locking would be catastrophic, probably a crash of some 
sort.  I haven't personally seen contention on this lock cause a real-world 
performance bottleneck, so I wonder if such an optimization is necessary.

For the scope of this patch, I'd prefer to focus on a straight-up port of the 
existing code to work on Windows.  We're taking a big step here, moving from 
not even compiling on Windows to fully functional, and the patch is already 
pretty large.  :-)  Potential performance enhancements certainly are welcome in 
separate patches.

FWIW, I think libhdfs has a weakness in that it has no clear-cut "initialize" 
function for the application to call during a single-threaded bootstrap 
sequence.  This would have given us an easy place to start the {{JavaVM}} and 
pre-populate the mapping of class names to class references.  Unfortunately, it 
would be backwards-incompatible to add that function now and demand existing 
applications change their code to call our initialize function.  Instead, we 
have no choice but to do lazy initialization, and that drives a lot of the 
complexity in libhdfs with the mutexes and the thread-local storage.  From my 
very quick scan of the HADOOP-10388 branch, it looks like we'll be providing a 
clearer initialization sequence there.  libhdfs likely will need to remain this 
way though.

> Porting libhdfs to Windows
> --------------------------
>
>                 Key: HDFS-573
>                 URL: https://issues.apache.org/jira/browse/HDFS-573
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: libhdfs
>         Environment: Windows, Visual Studio 2008
>            Reporter: Ziliang Guo
>            Assignee: Chris Nauroth
>         Attachments: HDFS-573.1.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current C code in libhdfs is written using C99 conventions and also uses 
> a few POSIX specific functions such as hcreate, hsearch, and pthread mutex 
> locks.  To compile it using Visual Studio would require a conversion of the 
> code in hdfsJniHelper.c and hdfs.c to C89 and replacement/reimplementation of 
> the POSIX functions.  The code also uses the stdint.h header, which is not 
> part of the original C89, but there exists what appears to be a BSD licensed 
> reimplementation written to be compatible with MSVC floating around.  I have 
> already done the other necessary conversions, as well as created a simplistic 
> hash bucket for use with hcreate and hsearch and successfully built a DLL of 
> libhdfs.  Further testing is needed to see if it is usable by other programs 
> to actually access hdfs, which will likely happen in the next few weeks as 
> the Condor Project continues with its file transfer work.
> In the process, I've removed a few what I believe are extraneous consts and 
> also fixed an incorrect array initialization where someone was attempting to 
> initialize with something like this: JavaVMOption options[noArgs]; where 
> noArgs was being incremented in the code above.  This was in the 
> hdfsJniHelper.c file, in the getJNIEnv function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to