Luca created TIKA-3822:
--------------------------
Summary: Plain text file reported as application/octet-stream
Key: TIKA-3822
URL: https://issues.apache.org/jira/browse/TIKA-3822
Project: Tika
Issue Type: Improvement
Affects Versions: 1.28
Reporter: Luca
Attachments: plaintextfile.txt
I need my application to detect as "text/plain" short files which contain some
control characters (SOH, STX, ETX, VT ...).
Depending on the total lenght of the file it may happen that the percentage of
control chars overcomes 2%, causingĀ isMostlyAscii method to return "false" (an
example is attached).
Is there any suggestion to avoid this issue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)