[ 
https://issues.apache.org/jira/browse/HADOOP-11929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565160#comment-14565160
 ] 

Colin Patrick McCabe commented on HADOOP-11929:
-----------------------------------------------

Hi [~aw], [~busbey],

We tried to be super-clever about detecting what modules to build in the past.  
It never worked.

The problem is there are hidden dependencies.  For example, if I change 
{{DomainSocketWatcher.java}}, I clearly want to build and test 
{{libhadoop.so}}, which contains the C domain socket code.  But no C files were 
changed, so how is your "super-clever" dependency solver going to figure that 
out?

Similarly I could change {{BZip2Codec.java}} and expect the native bzip code in 
{{Bzip2Compressor.c}} to be built and tested.  But again, there is no way for 
the build system to know that these are related.

Then there are even more subtle dependencies.  Let's say I make a change to a C 
file in hadoop-common.  Perhaps this changes a function that is only used in 
hadoop-hdfs-- for the sake of argument, let's say {{renameTo0}}.  But the 
hadoop-hdfs tests are not run since the dependency solver looks at the patch 
and says, "no files in hadoop-hdfs were changed, I'm done."

The only sane thing to do is to always build {{libhadoop.so}} and 
{{libhdfs.so}} no matter what, and always turn on all the options.  The options 
don't increase compilation time by any significant amount (if you don't believe 
me, benchmark it for yourself).

We could maybe avoid building fuse-dfs, the native mapreduce stuff in trunk, 
libhadooppipes, and libwebhdfs unless a file in there had changed.  Those 
subprojects are truly self-contained so that would work.  The native task stuff 
in particular is slow to compile, so that might actually be useful.  The rest 
of it I think we should just always build-- the build is flaky enough as-is.

> add test-patch plugin points for customizing build layout
> ---------------------------------------------------------
>
>                 Key: HADOOP-11929
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11929
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sean Busbey
>            Assignee: Allen Wittenauer
>            Priority: Minor
>         Attachments: HADOOP-11929.00.patch, HADOOP-11929.01.patch, 
> HADOOP-11929.02.patch, HADOOP-11929.03.patch, hadoop.sh
>
>
> Sean Busbey and I had a chat about this at the Bug Bash. Here's the proposal:
>   * Introduce the concept of a 'personality module'.
>   * There can be only one personality.
>   * Personalities provide a single function that takes as input the name of 
> the test current being processed
>   * This function uses two other built-in functions to define two queues: 
> maven module name and profiles to use against those maven module names
>   * If something needs to be compiled prior to this test (but not actually 
> tested), the personality will be responsible for doing that compilation
> In hadoop, the classic example is hadoop-hdfs needs common compiled with the 
> native bits. So prior to the javac tests, the personality would check 
> CHANGED_MODULES, see hadoop-hdfs, and compile common w/ -Pnative prior to 
> letting test-patch.sh do the work in hadoop-hdfs. Another example is our lack 
> of test coverage of various native bits. Since these require profiles to be 
> defined prior to compilation, the personality could see that something 
> touches native code, set the appropriate profile, and let test-patch.sh be on 
> its way.
> One way to think of it is some higher order logic on top of the automated 
> 'figure out what modules and what tests to run' functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to