Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchGotchas" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NutchGotchas?action=diff&rev1=7&rev2=8 at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) }}} - The fix (committed by Markus) for the SolrWriter class passes the value of the content field to a method to strip away non-characters, effectively avoiding the runtime exception. Various patches are available [[https://issues.apache.org/jira/browse/NUTCH-1016|here]] + The fix (committed by Markus) for the !SolrWriter class passes the value of the content field to a method to strip away non-characters, effectively avoiding the runtime exception. Various patches are available [[https://issues.apache.org/jira/browse/NUTCH-1016|here]] - === Removal of crawl-urlfilter.txt ===: + === Removal of crawl-urlfilter.txt === As of the release of Nutch 1.3, crawl-urlfilter.txt has been removed purposefully as it did not add anything to the other url filters (automaton | regex) in terms of functionality. By default the urlfilters contain (+.) which was what the crawl-urlfilter used to do. - === Confusion about "solrUrl is not set, indexing will be skipped..." log message ===: + === Confusion about "solrUrl is not set, indexing will be skipped..." log message === This relates to the removal of the Nutch Lucene legacy dependence to support indexing with Solr, and the road map to enable various other indexing implementations. We have two options for passing the indexing command to Nutch. * During the crawl command, as explained [[http://wiki.apache.org/nutch/RunningNutchAndSolr#A3._Crawl_your_first_website|here]]. * or during the later stage of sending an individual solrindex command to Solr as explained [[http://wiki.apache.org/nutch/RunningNutchAndSolr#A6._Integrate_Solr_with_Nutch|here]]. - === DiskErrorException while fetching ===: + === DiskErrorException while fetching === Questions like this one arise fairly regularly on the user@ list

