Annie Didier created TIKA-2669: ---------------------------------- Summary: Tika JAX-RS PDF parser option / custom config issue Key: TIKA-2669 URL: https://issues.apache.org/jira/browse/TIKA-2669 Project: Tika Issue Type: Bug Components: config Affects Versions: 1.18 Reporter: Annie Didier
PDF parsing using a config file behaves differently in Tika app than in Tika server. Tika server reads the custom config file, but the PDF parsing options are not being set. Here is an excerpt of output from the app: <p>WINS No: B29017 APACHE 27-38 UNIT 1H Date: 5/4/2017 </p> <p>AFE No: 1704555 Daily Completion and Workover Report DOL: </p> However, with the same configuration file the output from tika server is: <p>Daily Completion and Workover Report </p> <p>WINS No: </p> <p>AFE No: </p> <p>Date: </p> <p>DOL: </p> <p>APACHE 27-38 UNIT B29017 </p> <p>1704555 </p> <p>5/4/2017 </p> The tika config is: <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.pdf.PDFParser"> <params> <param name="sortByPosition" type="bool">true</param> </params> </parser> </parsers> </properties> -- This message was sent by Atlassian JIRA (v7.6.3#76005)