[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

Apache Wiki Thu, 15 Sep 2011 13:13:27 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "ErrorMessages" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=9&rev2=10

  
  Please report bugs to the mailing list!
  
+ <<TableOfContents>>
-   * Fetching
-   * Updating
-   * Searching
  
+ == General ==
+ 
- ==== Exception: java.net.SocketException: Invalid argument or cannot assign 
requested address on Fedora Core 3 or 4 ====
+ === Exception: java.net.SocketException: Invalid argument or cannot assign 
requested address on Fedora Core 3 or 4 ===
  It seems you have installed IPV6 on your machine.
  
  To solve this problem, add the following java param to the java instantiation 
in bin/nutch:
@@ -19, +19 @@

  
  # run it exec "$JAVA" $JAVA_HEAP_MAX $NUTCH_OPTS $JAVA_IPV4 -classpath 
"$CLASSPATH" $CLASS "$@"
  
- == FileNotFoundException: 1 ==
+ === FileNotFoundException: 1 ===
  
  delay 1 fails 
  crawltest and subdirectories are created; also ant compiles no probs; 
ROOT.war is installed and runs; urls file exists. Adding ./ or full path as x 
below changes nothing. Server runs squid on 80 and real Apache 1.3 on 81. 
Catalina is on 8080 and is up and running. 
@@ -83, +83 @@

  
  
  
- == Errors Fetching ==
+ == Fetching Errors ==
  
  '''Why do I get error "123456 104934 fetch of http://mydomain/index.html 
failed with: net.nutch.net.protocols.http.HttpError: HTTP Error: 401" when 
crawling?'''
    * An HTTP 401 error is returned from a remote webserver when you not 
authorized to view the page. Currently nutch does not support HTTP 
authentication but it will be trivial to add when the new HTTPClient fetcher 
code is committed.
@@ -94, +94 @@

  === While fetching I get UnknownHostException for known hosts ===
  Make sure your DNS server is working and/or it can handle the load of 
requests.
  
- == Errors Updating ==
+ == Updating Errors ==
  
  '''Until updating my DB I got a OutOfMemoryException or a 'to many files 
open' error.'''
    * The problems is that nutch opens more files then your OS allows to open. 
You can check the limits of your machine with "ulimit -a". In case you run 
nutch as superuser you can set the limit of open files for the actual session 
with "ulimit -n 65536". To change this limit permanently please read: 
http://bbcr.uwaterloo.ca/~brecht/servers/openfiles.html
  
- == Errors Searching ==
+ == Indexing Errors ==
+ === While indexing documents, I get the following error: ===
+ ''050529 011245 fetch okay, but can't parse myfile, reason: Content truncated 
at 65536 bytes. Parser can't handle incomplete msword file.''
+ 
+ '''What is happening?'''
+ 
+  . By default, the size of the documents downloaded by Nutch is limited (to 
65536 bytes). To allow Nutch to download larger files (via HTTP), modify 
nutch-site.xml and add an entry like this:
+ 
+ {{{
+     <property>
+       <name>http.content.limit</name>
+       <value>150000</value>
+     </property>
+ }}}
+  . If you do not want to limit the size of downloaded documents, set 
http.content.limit to a negative value:
+ 
+ {{{
+     <property>
+       <name>http.content.limit</name>
+       <value>-1</value>
+     </property>
+ }}}
+ 
+ == Searching Errors  ==
  
  '''Tomcat reports root cause: java.lang.OutOfMemoryError and does not find 
anything.'''
    * Try to give java / tomcat some more memory. Add to catalina.sh (linux): 
JAVA_OPTS=-Xmx256m
  
- == Errors installing ==
+ == Installation Errors  ==
  
  See GettingNutchRunningWithUbuntu for some help.
- 
- 
  
  == Nutch on Debian (cont) ==
  What is mentioned here

[Nutch Wiki] Trivial Update of "ErrorMessages" by LewisJohnMcgibbney

Reply via email to