[ https://issues.apache.org/jira/browse/TIKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sara Miller updated TIKA-2304: ------------------------------ Description: I get strange output when parsing this pdf: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf with PUT 192.168.1.115:9908/tika and headers: Accept:text/html An extract of the output: "<p>��������� �� ���������������������������� �!"��� </p> <p>#$�% ���!"�%&'��(*)+�,!-��� </p> <p> .�� ��/�� 10��������� �!"21� �434�%54!"�6� </p> <p>7�8:9�;�<>=@?�A�9�BDC </p> <p>E A FHG�9DI"JLK�M�NLOPJLB�N�J.Q�JLGR8:K-I"FSJLB�I </p> <p>E M T"U:V@TXW Y�U Z�NLI"A [RJLK \]U U:V</p> <p/>" was: I get strange output when parsing this pdf: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf with PUT 192.168.1.115:9908/tika and headers: Accept:text/html > Strange output from PdfParser > ----------------------------- > > Key: TIKA-2304 > URL: https://issues.apache.org/jira/browse/TIKA-2304 > Project: Tika > Issue Type: Bug > Components: server > Affects Versions: 1.13 > Environment: org.apache.tika.parser.pdf.PDFParser > Reporter: Sara Miller > Priority: Minor > > I get strange output when parsing this pdf: > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3977&rep=rep1&type=pdf > with PUT 192.168.1.115:9908/tika and headers: Accept:text/html > An extract of the output: > "<p>��������� �� > ���������������������������� �!"��� > </p> > <p>#$�% ���!"�%&'��(*)+�,!-��� > </p> > <p> > .�� ��/�� 10��������� �!"21� �434�%54!"�6� > </p> > <p>7�8:9�;�<>=@?�A�9�BDC > </p> > <p>E A FHG�9DI"JLK�M�NLOPJLB�N�J.Q�JLGR8:K-I"FSJLB�I > </p> > <p>E M T"U:V@TXW Y�U Z�NLI"A [RJLK \]U U:V</p> > <p/>" -- This message was sent by Atlassian JIRA (v6.3.15#6346)