Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by SebastienLeCallonnec: http://wiki.apache.org/nutch/FAQ The comment on the change is: Added FAQ on how to increase size of downloaded docs. ------------------------------------------------------------------------------ Now you can invoke the crawler and index all or part of your disk. The only remaining gotcha is that if you use Mozilla it will '''not''' load file: URLs from a web paged fetched with http, so if you test with the Nutch web container running in Tomcat, annoyingly, as you click on results nothing will happen as Mozilla by default does not load file URLs. This is mentioned [http://www.mozilla.org/quality/networking/testing/filetests.html here] and this behavior may be disabled by a [http://www.mozilla.org/quality/networking/docs/netprefs.html preference] (see security.checkloaduri). IE5 does not have this problem. + '''While indexing documents, I get the following error:''' ''050529 011245 fetch okay, but can't parse myfile, reason: Content truncated at 65536 bytes. Parser can't handle incomplete msword file.'' '''What is happening?''' + By default, the size of the documents downloaded by Nutch is limited (to 65536 bytes). To allow Nutch to download larger files (via HTTP), modify nutch-site.xml and add an entry like this: + + <property> + <name>http.content.limit</name> + <value>'''150000'''</value> + </property> + + If you do not want to limit the size of downloaded documents, set http.content.limit to a negative value. ---- == Segment Handling == ------------------------------------------------------- This SF.Net email is sponsored by Yahoo. Introducing Yahoo! Search Developer Network - Create apps using Yahoo! Search APIs Find out how you can build Yahoo! directly into your own Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005 _______________________________________________ Nutch-cvs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-cvs
