[Nutch Wiki] Update of "FAQ" by ra

Apache Wiki Fri, 10 Aug 2007 06:30:53 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The following page has been changed by ra:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  The crawl tool expects as its first parameter the folder name where the 
seeding urls file is located so for example if your urls.txt is located in 
/nutch/seeds the crawl command would look like: crawl seed -dir 
/user/nutchuser...
  
- ==== Some pages are not indexed but my regex file and everyhing else is okay 
- what is going on? ====
+ ==== Some pages are not indexed but my regex file and everything else is okay 
- what is going on? ====
  The crawl tool has a default limitation of 100 outlinks of one page that are 
being fetched.
- To overcome this limitation change the property to a higher value or simply 
-1.
+ To overcome this limitation change the property to a higher value or simply 
-1 (unlimited).
  
  file: conf/nutch-default.xml
+ 
  {{{
   <property>
     <name>db.max.outlinks.per.page</name>
@@ -415, +416 @@

   </property> 
  }}}
  see also: http://www.mail-archive.com/[EMAIL PROTECTED]/msg08665.html
- 
+ (tested under nutch 0.9)

[Nutch Wiki] Update of "FAQ" by ra

Reply via email to