Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by MarioMendez:
http://wiki.apache.org/nutch/NutchTutorial

------------------------------------------------------------------------------
  
   {{{ http://lucene.apache.org/nutch/ }}}
  
-  * Edit the file conf/crawl-urlfilter.txt (it works for me when I used the 
file conf/regex-urlfilter.txt) and replace MY.DOMAIN.NAME with the name of the 
domain you wish to crawl. For example, if you wished to limit the crawl to the 
apache.org domain, the line should read:
+  * Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with the 
name of the domain you wish to crawl. For example, if you wished to limit the 
crawl to the apache.org domain, the line should read:
  
   {{{ +^http://([a-z0-9]*\.)*apache.org/ }}}
  
   This will include any url in the domain apache.org.
+ 
+ * Until someone could explain this...When I use the file crawl-urlfilter.txt 
the filter doesn't work, instead of it use the file conf/regex-urlfilter.txt 
and change the last line from "+." to "-."
  
  === Crawl Command: Running the Crawl ===
  

Reply via email to