Hi,

First, the symptoms: I was doing some tests on sites with many PDFs, and the Fetcher was gradually slowing down, until it became stuck. This was repeatable. A thread dump showed all threads waiting somewhere in PDFBox code (which is used by parse-pdf). In an email exchange with the author (Ben Litchfield) he confirmed that there was a problem in the latest official release of PDFBox, which could result in such behaviour.

If you experienced such problems, the fix is to use the latest CVS version of PDFBox, where this problem is believed to be fixed.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to