[ https://issues.apache.org/jira/browse/NUTCH-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993483#comment-13993483 ]
Julien Nioche commented on NUTCH-1770: -------------------------------------- IIRC this mechanism was put in place as partial documents could crash the underlying PDF parser. I would not go as far as call it very sad though. > Nutch is failing to parse all PDFs > ---------------------------------- > > Key: NUTCH-1770 > URL: https://issues.apache.org/jira/browse/NUTCH-1770 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 2.3 > Environment: FreeBSD 10, Open JDK 8 > Reporter: Rogério Pereira Araújo > Priority: Critical > Fix For: 2.3 > > > I'm trying to craw a filesystem directory containing several PDFs, but when > the parsing stage starts, I'm getting the error described on ticket > PDFBOX-1122 -- This message was sent by Atlassian JIRA (v6.2#6252)