[ https://issues.apache.org/jira/browse/NUTCH-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452983#comment-17452983 ]
Sebastian Nagel commented on NUTCH-2913: ---------------------------------------- Properties are not documented in nutch-default.xml but are used in Nutch code: {noformat} db.reader.stats.sort db.reader.topn db.reader.topn.min domain.statistics.mode fetcher.timelimit hostdb.reading.crawldb indexer.url.filters injector.current.time link.analyze.iteration link.analyze.rank.one linkdb.regex linkdb.url.filters partition.url.seed segment.merger.filter segment.merger.normalizer segment.merger.segmentName segment.merger.slice webgraph.url.filters {noformat} Properties documented but eventually set (value replaced) from command-line in Nutch tools: {noformat} crawldb.url.filters crawldb.url.normalizers db.injector.overwrite db.injector.update fetcher.threads.fetch fetcher.throughput.threshold.check.after parse.filter.urls parse.normalize.urls segment.reader.content.recode {noformat} I've left away some more properties which are programmatically set in more specific contexts, eg. the SitemapProcessor adjusting http.content.limit to the max. size of a sitemap. > nutch-default.xml: document properties set programatically > ---------------------------------------------------------- > > Key: NUTCH-2913 > URL: https://issues.apache.org/jira/browse/NUTCH-2913 > Project: Nutch > Issue Type: Improvement > Components: configuration > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.19 > > > (see also the discussion in NUTCH-2910) > [nutch-default.xml|https://nutch.apache.org/documentation/javadoc/apidocs/resources/nutch-default.xml] > should include all properties used in Nutch, even those which are set > programmatically in Java code in order to pass command-line options and > job-global values from the main method (job client) to mapper/reducer tasks. > We could use the "tags" element (HADOOP-15005) to mark them in a unique way. > But a comment in the property description would also be good. > See also the table > [https://cwiki.apache.org/confluence/display/NUTCH/NutchPropertiesCompleteList]. > -- This message was sent by Atlassian Jira (v8.20.1#820001)