[jira] [Commented] (HDFS-5541) LIBHDFS questions and performance suggestions

Chris Nauroth (JIRA) Fri, 06 Dec 2013 13:20:03 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841717#comment-13841717
 ]


Chris Nauroth commented on HDFS-5541:
-------------------------------------

Hi [~stevebovy],

I had a chance to look at this a bit more today.  Thanks again for sharing your 
work.  To help move it forward, I'd like to suggest that we close out this 
issue and replace it with the following set of issues focused on completing 
more specific tasks.  Splitting the work up helps make code review easier and 
ultimately helps get the code committed.

# libHDFS Windows compatibility - This would be the bare minimum patch required 
for Windows compatibility.  I think the scope would include things like the JVM 
mutex macros, uthash, build script changes and C89 compatibility stuff like 
declaring variables at the top of the function.
# libHDFS AIX compatibility - Colin has suggested a build script change to 
support this instead of changing the comment style.
# libHDFS performance improvements - The above issues would not include any of 
the performance improvement work, so if you want to keep pursuing that, then 
we'd do it in a separate patch here.  Depending on the scope, this also might 
split into multiple performance improvement patches.

For each of these, the process would be to post patch files applicable to 
trunk, and we can code review and test them.  From your notes, it sounds like 
you're also interested in getting this into branch-1 or branch-1-win.  If so, 
then you can provide patches for those branches too.  Do you think this plan 
makes sense?

I have a couple of comments related to the code I saw in the attachment:

* This version splits the headers and implementation files into separate inc 
and src directories.  If you want to propose a change in the source layout, 
then let's handle that in its own separate issue, without any actual code 
changes mixed in.
* I saw the include for {{uthash.h}}, but I also still saw calls to {{hcreate}} 
and {{hsearch}}.  I was expecting to see these call sites switch to using the 
uthash equivalents.
* We'll want to add .sln and .vcxproj files, similar to what we do for 
winutils.exe and hadoop.dll.  The supported compilers on Windows are the free 
Windows SDK or Visual Studio 2010 Professional.

bq. And How do I get the NativeCodeLoader to work ??

Assuming you have a build of libhadoop.so or hadoop.dll, you'd need to enable 
the libhdfs process to dynamically link to it.  One way to do this is to launch 
the JVM with -Djava.library.path=<path to libhadoop.so or hadoop.dll>.  You can 
set environment variable {{LIBHDFS_OPTS}} to control the JVM arguments that 
libhdfs passes to its embedded JVM.  The other way to do it is using the 
dynamic linking capabilities provided by the OS, i.e. {{LD_LIBRARY_PATH}} on 
Linux or {{PATH}} on Windows.

bq. Dag nab it >> I cannot figure this one out >> the append does not work

Sorry for the late reply, but this is due to append being disabled by default 
in the 1.x line.  I think you've figured this part out already.


> LIBHDFS questions and performance suggestions
> ---------------------------------------------
>
>                 Key: HDFS-5541
>                 URL: https://issues.apache.org/jira/browse/HDFS-5541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Stephen Bovy
>            Priority: Minor
>         Attachments: pdclibhdfs.zip
>
>
> Since libhdfs is a "client" interface",  and esspecially because it is a "C" 
> interface , it should be assumed that the code will be used accross many 
> different platforms, and many different compilers.
> 1) The code should be cross platform ( no Linux extras )
> 2) The code should compile on standard c89 compilers, the
> >>>  {least common denominator rule applies here} !! <<  
> C  code with  "c"   extension should follow the rules of the c standard  
> All variables must be declared at the begining of scope , and no (//) 
> comments allowed 
> >> I just spent a week white-washing the code back to nornal C standards so 
> >> that it could compile and build accross a wide range of platforms << 
> Now on-to  performance questions 
> 1) If threads are not used why do a thread attach ( when threads are not used 
> all the thread attach nonesense is a waste of time and a performance killer ) 
> 2) The JVM  init  code should not be imbedded within the context of every 
> function call   .  The  JVM init code should be in a stand-alone  LIBINIT 
> function that is only invoked once.   The JVM * and the JNI * should be 
> global variables for use when no threads are utilized.  
> 3) When threads are utilized the attach fucntion can use the GLOBAL  jvm * 
> created by the LIBINIT  { WHICH IS INVOKED ONLY ONCE } and thus safely 
> outside the scope of any LOOP that is using the functions 
> 4) Hash Table and Locking  Why ?????
> When threads are used the hash table locking is going to hurt perfromance .  
> Why not use thread local storage for the hash table,that way no locking is 
> required either with or without threads.   
>  
> 5) FINALLY Windows  Compatibility 
> Do not use posix features if they cannot easilly be replaced on other 
> platforms   !!



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5541) LIBHDFS questions and performance suggestions

Reply via email to