Tilman Hausherr created TIKA-4278:
-------------------------------------
Summary: TextAndCSVParser doesn't detect semicolon separated file
Key: TIKA-4278
URL: https://issues.apache.org/jira/browse/TIKA-4278
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 2.9.2
Reporter: Tilman Hausherr
I ran the code from the attached SO issue and yes it doesn't detect semicolon
separated files. The reason is this line in {{TextAndCSVParser.java}}:
{code:java}
private static final char[] DEFAULT_DELIMITERS = new char[]\{',', '\t'};
{code}
This is later uses by {{CSVSniffer}}. For some reason the other delimiters
(pipe, colon and semicolon) aren't in that array, although they are in
{{CHAR_TO_STRING_DELIMITER_MAP}}. I modified {{DEFAULT_DELIMITERS}} and now it
works for semicolon.
Can I change this by adding the missing delimiters or was there a reason that I
missed?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)