Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "ErrorMessages" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/ErrorMessages?action=diff&rev1=10&rev2=11 <<TableOfContents>> - == General == + = General = + == Java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml == + The crawl tool expects as its first parameter the folder name where the seeding urls file is located so for example if your urls.txt is located in /nutch/seeds the crawl command would look like: crawl seed -dir /user/nutchuser... + - === Exception: java.net.SocketException: Invalid argument or cannot assign requested address on Fedora Core 3 or 4 === + == Exception: java.net.SocketException: Invalid argument or cannot assign requested address on Fedora Core 3 or 4 == It seems you have installed IPV6 on your machine. To solve this problem, add the following java param to the java instantiation in bin/nutch: @@ -19, +22 @@ # run it exec "$JAVA" $JAVA_HEAP_MAX $NUTCH_OPTS $JAVA_IPV4 -classpath "$CLASSPATH" $CLASS "$@" - === FileNotFoundException: 1 === + == FileNotFoundException: 1 == delay 1 fails crawltest and subdirectories are created; also ant compiles no probs; ROOT.war is installed and runs; urls file exists. Adding ./ or full path as x below changes nothing. Server runs squid on 80 and real Apache 1.3 on 81. Catalina is on 8080 and is up and running. @@ -83, +86 @@ - == Fetching Errors == + = Fetching Errors = '''Why do I get error "123456 104934 fetch of http://mydomain/index.html failed with: net.nutch.net.protocols.http.HttpError: HTTP Error: 401" when crawling?''' * An HTTP 401 error is returned from a remote webserver when you not authorized to view the page. Currently nutch does not support HTTP authentication but it will be trivial to add when the new HTTPClient fetcher code is committed. @@ -91, +94 @@ '''/etc/host.conf: line 1: cannot specify more then 4 services''' * Please have a look at http://sources.redhat.com/ml/bug-glibc/2002-07/msg00269.html - === While fetching I get UnknownHostException for known hosts === + == While fetching I get UnknownHostException for known hosts == Make sure your DNS server is working and/or it can handle the load of requests. - == Updating Errors == + = Updating Errors = '''Until updating my DB I got a OutOfMemoryException or a 'to many files open' error.''' * The problems is that nutch opens more files then your OS allows to open. You can check the limits of your machine with "ulimit -a". In case you run nutch as superuser you can set the limit of open files for the actual session with "ulimit -n 65536". To change this limit permanently please read: http://bbcr.uwaterloo.ca/~brecht/servers/openfiles.html - == Indexing Errors == + = Indexing Errors = - === While indexing documents, I get the following error: === + == While indexing documents, I get the following error: == ''050529 011245 fetch okay, but can't parse myfile, reason: Content truncated at 65536 bytes. Parser can't handle incomplete msword file.'' '''What is happening?''' @@ -122, +125 @@ </property> }}} - == Searching Errors == + = Searching Errors = '''Tomcat reports root cause: java.lang.OutOfMemoryError and does not find anything.''' * Try to give java / tomcat some more memory. Add to catalina.sh (linux): JAVA_OPTS=-Xmx256m - == Installation Errors == + = Installation Errors = See GettingNutchRunningWithUbuntu for some help. @@ -169, +172 @@ Setup on a SUSE 8.1 system was no problem btw ... +

