Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchGotchas" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchGotchas?action=diff&rev1=6&rev2=7 == Current Gotchas and using them: == - === No agents listed in 'http.agent.name' property ===: + === No agents listed in 'http.agent.name' property === Since 1.3 Nutch is called from either of the runtime dirs (runtime/local and runtime/deploy). The conf files should be modified in runtime/local/conf, not in $NUTCH_HOME/conf. - === Nutch-1016: Strip UTF-8 non-character codepoints ===: + === Nutch-1016: Strip UTF-8 non-character codepoints === This JIRA issue affects the indexer and relates to the stripping of UTF-8 non-character codepoints which exist within some documents and was initially discovered during large crawls. When indexing to Solr this will yield the following exception:

