[jira] [Commented] (SOLR-1786) Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()] fixed in PDFbox 1.0?

Jan Iwaszkiewicz (Commented) (JIRA) Tue, 15 Nov 2011 02:18:17 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150348#comment-13150348
 ]


Jan Iwaszkiewicz commented on SOLR-1786:
----------------------------------------

Thanks. I'm quite sure it is fixed. Unfortunately I don't work in the CDS 
project anymore and we also didn't decide to use PDFBox for textification 
(pdftotext instead). Please try to textify/index the PDF linked above to verify.


                
> Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()]  fixed in PDFbox 
> 1.0?
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1786
>                 URL: https://issues.apache.org/jira/browse/SOLR-1786
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.5
>         Environment: Ubuntu 9.10, 32bit
>            Reporter: Jan Iwaszkiewicz
>            Priority: Critical
>              Labels: PDFbox
>             Fix For: 3.5, 4.0
>
>
> I tried indexing several thousand PDF documents but could not finish as Solr 
> was falling into an endless loop for some of them, for instance: 
> http://cdsweb.cern.ch/record/702585/files/sl-note-2000-019.pdf (the PDF seems 
> OK).
> Can Solr start using PDFbox 1.0?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-1786) Solr (trunk rev. 912116) suffers from PDFBOX-537 [Endless loop in org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary()] fixed in PDFbox 1.0?

Reply via email to