[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365130 ]
Doug Cutting commented on NUTCH-193:
------------------------------------
Okay, I've moved the code from Nutch to Hadoop. Now I need to repair Nutch so
that it still works!
One remaining problem is the need to separate nutch config files from hadoop
config files. There's now a hadoop-default.xml and hadoop-site.xml, which are
separate from the similarly-named nutch files. For now, I'll fix this by
adding the following methods to Hadoop's Configuration class:
void addDefaultResource(String name);
void addFinalResource(String name);
Then add a Nutch utility class like:
public class NutchConfiguration {
public static Configuration create() {
Configuration conf = new Configuration();
addNutchResources(conf);
}
public static Configuration addNutchResources(Configuration conf) {
addDefaultResource("nutch-default.xml");
addFinalResource("nutch-site.xml");
}
}
Then all of the places which currently call 'new NutchConf()' can be replaced
with 'NutchConfiguration().create()'.
Longer-term we might consider a more radical re-design of the configuration
API. But first we need to get Hadoop and Nutch split.
> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
> Key: NUTCH-193
> URL: http://issues.apache.org/jira/browse/NUTCH-193
> Project: Nutch
> Type: Task
> Components: ndfs
> Versions: 0.8-dev
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 0.8-dev
>
> The NDFS and MapReduce code should move from Nutch to a new Lucene
> sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will
> be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira