[jira] [Commented] (HADOOP-7405) libhadoop is all or nothing

Allen Wittenauer (JIRA) Mon, 20 Jun 2011 11:13:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052118#comment-13052118
 ]


Allen Wittenauer commented on HADOOP-7405:
------------------------------------------

bq. Since Hadoop Kerberos Mac OS X support was never fully there, it is not 
possible to compile libhadoop due to some compiler errors.

The compiler errors are fairly simple to fix on Darwin.  I don't know why, but 
it seems like 9 times out of 10, we favor BSD functionality when we go with 
something non-portable. 

bg. Because of this my take is that if we require native code to run Hadoop, we 
should provide the full set of native code for each platform we are building 
for. 

Regardless of what happens in this jira, we need a testsuite for the C code 
anyway.  OS X actually proves out that even if the code compiles, it doesn't 
necessarily mean it works properly.  (See HADOOP-7367).

bg. A while ago I've opened a HADOOP-7083 to enable running Hadoop with 
Kerberos ON without relying on some libhadoop functionality and the argument 
there was that doing that was a security risk. 

Right.  It wanted to create a third security mode where some stuff worked and 
some stuff didn't.  That's not quite what I'm asking for here and it wouldn't 
actually fix the problem we're hitting anyway. The security functionality is 
orthogonal to the compression functionality.  That's the base, surface issue.  
Since it is in one big chunk, we broke *both*.

(While I guess it wasn't obvious, I should probably state that I'm not looking 
for a "partially working" security mode.  The scope of what constitutes a 
working unit would still need to be defined.  It is more than reasonable to say 
that all of the functions that are directly security related would need to be 
ported and treated like one block.  Asking libhadoop.so if it "supports 
security" seems like a reasonable thing to ask it.)

The problem that we've got is that we have a lot of unrelated code sitting in 
libhadoop.so.  Every time we add something we run the risk of regressing 
features out of platforms other than Linux since those other platforms are an 
afterthought.  HADOOP-7206 may actually be a great example of this:  if we go 
with a pure native implementation, we won't be able to support Snappy on 
anything but Linux with the current state of things.  Lack of compression 
support has a *direct* impact on the client.  I'd be surprised if the majority 
of shops are only using Linux clients. 

Wouldn't it be great to be able to ask the lib "do you support gzip, do you 
support snappy, do you support lzo, do you support security, ..."?  Then we 
could add code as needed, do ports as needed, etc.  An alternative would be 
that we start breaking libhadoop up into at least related functionality.

I suppose the other outcome might be that we as a community just admit that we 
don't support Hadoop on anything but Linux and give up on any semblance of 
portability.  More and more code is being added or rewritten in C.  I would be 
surprised if this trend changes.

> libhadoop is all or nothing
> ---------------------------
>
>                 Key: HADOOP-7405
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7405
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.20.203.0, 0.23.0
>         Environment: Everything not Linux
>            Reporter: Allen Wittenauer
>            Priority: Blocker
>              Labels: regression
>
> As a result of a ton of new code in libhadoop being added in 0.20.203/0.22, a 
> lot of features that used to work no longer do reliably.  The most common 
> problem is native compression, but other issues such as Mac OS X's group 
> support broke as well.  The native code checks need to be refactored such 
> that libhadoop.so should report what it supports rather than having the 
> Java-side assume that if it loads, it is all supported.  This would allow us 
> to stub routines until they've been vetted, removing the chances of such 
> regressions appearing in the future.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7405) libhadoop is all or nothing

Reply via email to