Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=68&rev2=69 {{{ http://nutch.apache.org/ }}} + * Edit the file conf/regex-urlfilter.txt and replace + {{{ + # accept anything else + +. + }}} + + with a regular expression matching the domain you wish to crawl. For example, if you wished to limit the crawl to the nutch.apache.org domain, the line should read: + + {{{ + +^http://([a-z0-9]*\.)*nutch.apache.org/ + }}} + + This will include any url in the domain nutch.apache.org. * Run the following command: {{{ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 @@ -102, +115 @@ <field name="content" type="text" stored="true" indexed="true"/> }}} - '''This tutorial was originally constructed and posted by 'waycool' on the user lists. It has been edited slightly for integration into the Apache Nutch project.''' -

