[ https://issues.apache.org/jira/browse/SOLR-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe resolved SOLR-2460. ------------------------------- Resolution: Fixed Fix Version/s: 3.5 Assignee: Steven Rowe In Solr 3.5, Tika will be upgraded to v0.10, which includes PDFbox 1.6.0. (See SOLR-2372) > Some European characters cannot be parsed correctly for some PDFs > ----------------------------------------------------------------- > > Key: SOLR-2460 > URL: https://issues.apache.org/jira/browse/SOLR-2460 > Project: Solr > Issue Type: Bug > Components: contrib - Solr Cell (Tika extraction) > Affects Versions: 1.4.1, 3.1 > Environment: Tika, PDFBox > Reporter: Erlend Garåsen > Assignee: Steven Rowe > Priority: Minor > Fix For: 3.5, 3.1.1 > > > The Norwegian characters (æ, ø and å) in the following PDF document will not > display correctly after Solr has indexed it, using Solr Cell: > http://ridder.uio.no/dokument.pdf > If I manually change the version of PDFBox (one of Tika's dependencies) to > 1.4.0, the document will parse correctly. > I suggest that the next release of Solr ships with version 0.9 of Tika which > also has updated its PDFBox dependencies to 1.4.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org