[ 
https://issues.apache.org/jira/browse/NUTCH-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452983#comment-17452983
 ] 

Sebastian Nagel commented on NUTCH-2913:
----------------------------------------

Properties are not documented in nutch-default.xml but are used in Nutch code:
{noformat}
db.reader.stats.sort
db.reader.topn
db.reader.topn.min
domain.statistics.mode
fetcher.timelimit
hostdb.reading.crawldb
indexer.url.filters
injector.current.time
link.analyze.iteration
link.analyze.rank.one
linkdb.regex
linkdb.url.filters
partition.url.seed
segment.merger.filter
segment.merger.normalizer
segment.merger.segmentName
segment.merger.slice
webgraph.url.filters
{noformat}

Properties documented but eventually set (value replaced) from command-line in 
Nutch tools:
{noformat}
crawldb.url.filters
crawldb.url.normalizers
db.injector.overwrite
db.injector.update
fetcher.threads.fetch
fetcher.throughput.threshold.check.after
parse.filter.urls
parse.normalize.urls
segment.reader.content.recode
{noformat}

I've left away some more properties which are programmatically set in more 
specific contexts, eg. the SitemapProcessor adjusting http.content.limit to the 
max. size of a sitemap.

> nutch-default.xml: document properties set programatically
> ----------------------------------------------------------
>
>                 Key: NUTCH-2913
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2913
>             Project: Nutch
>          Issue Type: Improvement
>          Components: configuration
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.19
>
>
> (see also the discussion in NUTCH-2910)
> [nutch-default.xml|https://nutch.apache.org/documentation/javadoc/apidocs/resources/nutch-default.xml]
>  should include all properties used in Nutch, even those which are set 
> programmatically in Java code in order to pass command-line options and 
> job-global values from the main method (job client) to mapper/reducer tasks.
> We could use the "tags" element (HADOOP-15005) to mark them in a unique way. 
> But a comment in the property description would also be good.
> See also the table 
> [https://cwiki.apache.org/confluence/display/NUTCH/NutchPropertiesCompleteList].
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to