Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/95#discussion_r54781452
--- Diff: src/java/org/apache/nutch/indexer/IndexerMapReduce.java ---
@@ -52,14 +52,34 @@
import org.apache.nutch.protocol.Content;
import org.apache.nutch.scoring.ScoringFilterException;
import org.apache.nutch.scoring.ScoringFilters;
-
-public class IndexerMapReduce extends Configured implements
- Mapper<Text, Writable, Text, NutchWritable>,
- Reducer<Text, NutchWritable, Text, NutchIndexAction> {
+import org.apache.nutch.util.NutchConfiguration;
+
+/**
+ * <p>This class is typically invoked from within
+ * {@link org.apache.nutch.indexer.IndexingJob}
+ * and handles all MapReduce functionality required
+ * when undertaking indexing.</p>
+ * <p>This is a consequence of one or more indexing plugins
+ * being invoked which extend
+ * {@link org.apache.nutch.indexer.IndexWriter}.</p>
+ * <p>See
+ * {@link org.apache.nutch.indexer.IndexerMapReduce#initMRJob(Path, Path,
Collection, JobConf, boolean)}
+ * for details on the specific data structures and parameters required for
indexing.</p>
+ *
+ */
+public class IndexerMapReduce {
public static final Logger LOG = LoggerFactory
.getLogger(IndexerMapReduce.class);
+ // using normalizers and/or filters
+ private static boolean normalize = false;
+ private static boolean filter = false;
+
+ // url normalizers, filters and job configuration
+ private static URLNormalizers urlNormalizers;
+ private static URLFilters urlFilters;
--- End diff --
Why are these 4 member variables now static?
Also, it looks weird if a static variable of the outer class is initialized
in a non-static method of one inner class (IndexerMapReduceReducer.config()).
The mapper class cannot be used without instantiating the reducer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---