Ben McCann created TIKA-1753:
--------------------------------
Summary: Improper concatenation
Key: TIKA-1753
URL: https://issues.apache.org/jira/browse/TIKA-1753
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Ben McCann
The code below will output text for a pdf. Words that are on different lines
are concatenated together
CaptureXMLHandler handler = new CaptureXMLHandler();
byte[] bytes = IOUtils.toByteArray(new FileInputStream(new
File("resume.pdf")));
new AutoDetectParser().parse(new ByteArrayInputStream(bytes), handler, new
Metadata(), new ParseContext());
System.out.println(handler.toString());
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)