Ronan lanore created TIKA-2301: ---------------------------------- Summary: Tika add char to txt file when parsing with '-J -t' options Key: TIKA-2301 URL: https://issues.apache.org/jira/browse/TIKA-2301 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.14, 1.11 Reporter: Ronan lanore Priority: Minor
Créate a file with text editor with content {code} Tika txt content {code} no return line Parse it with {code} java -jar tika-app-1.14.jar -t ~/Documents/git/system/nodejs/searchEs/test/ressources/txtFiles/tika-content.txt {code} The result appear with tow "\n" at the end of file. why ? Parse it with 'J' and -t {code} java -jar tika-app-1.14.jar -J -t ~/Documents/git/system/nodejs/searchEs/test/ressources/txtFiles/tika-content.txt {code} Result: {code} [{"Content-Encoding":"ISO-8859-1","Content-Length":"17","Content-Type":"text/plain; charset\u003dISO-8859-1","X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.txt.TXTParser"],"X-TIKA:content":"\n\n\n\n\n\n\n\n\n\nTika txt content\n\n","X-TIKA:parse_time_millis":"64","resourceName":"tika-content.txt"}] {code} They are a lot of '\n' adding at begining of "X-TIKA:content" It's the same with tika-server with "/rmeta/text" path. -- This message was sent by Atlassian JIRA (v6.3.15#6346)