[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Doug Cutting (JIRA) Fri, 03 Feb 2006 13:19:34 -0800

    [ 
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365130 ]


Doug Cutting commented on NUTCH-193:
------------------------------------

Okay, I've moved the code from Nutch to Hadoop.  Now I need to repair Nutch so 
that it still works!

One remaining problem is the need to separate nutch config files from hadoop 
config files.  There's now a hadoop-default.xml and hadoop-site.xml, which are 
separate from the similarly-named nutch files.  For now, I'll fix this by 
adding the following methods to Hadoop's Configuration class:

void addDefaultResource(String name);
void addFinalResource(String name);

Then add a Nutch utility class like:

public class NutchConfiguration {
  public static Configuration create() {
    Configuration conf = new Configuration();
    addNutchResources(conf);
  }
  public static Configuration addNutchResources(Configuration conf) {
    addDefaultResource("nutch-default.xml");
    addFinalResource("nutch-site.xml");
  }
}

Then all of the places which currently call 'new NutchConf()' can be replaced 
with 'NutchConfiguration().create()'.

Longer-term we might consider a more radical re-design of the configuration 
API.  But first we need to get Hadoop and Nutch split.





> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene 
> sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will 
> be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (NUTCH-193) move NDFS and MapReduce to a separate project

Reply via email to