Akash created TIKA-3155:
---------------------------
Summary: Parse Error while extracting CSV files
Key: TIKA-3155
URL: https://issues.apache.org/jira/browse/TIKA-3155
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.24.1
Reporter: Akash
Attachments: UTF-8_chars.csv
We are getting parse error while trying to extract csv files.
This was working in version 1.9, but exception coming in 1.24.1
{code:java}
/Exception in thread "main" org.apache.tika.exception.TikaException: exception
parsing the csv
at
org.apache.tikar.csv.TextAndCSVParser.parse.parse(TextAndCSVParser.java:198
undefined)
at
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
at
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
at
org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143
undefined)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209
undefined)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
Caused by: java.lang.IllegalStateException: IOException reading next record:
java.io.IOException: (startline 39) EOF reached before encapsulated token
finished
at
org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:145
undefined)
at
org.apache.commons.csv.CSVParser$CSVRecordIterator.hasNext(CSVParser.java:155
undefined)
at
org.apache.tikar.csv.TextAndCSVParser.parse.parse(TextAndCSVParser.java:178
undefined)
... 6 more
Caused by: java.io.IOException: (startline 39) EOF reached before encapsulated
token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:288
undefined)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:158 undefined)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:674
undefined)
at
org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:142
undefined)/
{code}
Issue is coming when we encounter double quotes in one of the cells.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)