[
https://issues.apache.org/jira/browse/SOLR-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150348#comment-13150348
]
Jan Iwaszkiewicz commented on SOLR-1786:
----------------------------------------
Thanks. I'm quite sure it is fixed. Unfortunately I don't work in the CDS
project anymore and we also didn't decide to use PDFBox for textification
(pdftotext instead). Please try to textify/index the PDF linked above to verify.
> Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()] fixed in PDFbox
> 1.0?
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-1786
> URL: https://issues.apache.org/jira/browse/SOLR-1786
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Affects Versions: 1.5
> Environment: Ubuntu 9.10, 32bit
> Reporter: Jan Iwaszkiewicz
> Priority: Critical
> Labels: PDFbox
> Fix For: 3.5, 4.0
>
>
> I tried indexing several thousand PDF documents but could not finish as Solr
> was falling into an endless loop for some of them, for instance:
> http://cdsweb.cern.ch/record/702585/files/sl-note-2000-019.pdf (the PDF seems
> OK).
> Can Solr start using PDFbox 1.0?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]