Hi everyone,
I am using Spark 2.1.1 to read csv files and convert to avro files.
One problem that I am facing is if one row of csv file has more columns
than maxColumns (default is 20480). The process of parsing was stop.

Internal state when error was thrown: line=1, column=3, record=0,
charIndex=12
com.univocity.parsers.common.TextParsingException:
java.lang.ArrayIndexOutOfBoundsException - 2
Hint: Number of columns processed may have exceeded limit of 2 columns. Use
settings.setMaxColumns(int) to define the maximum number of columns your
input can have
Ensure your configuration is correct, with delimiters, quotes and escape
sequences that match the input format you are trying to parse
Parser Configuration: CsvParserSettings:


I did some investigation in univocity
<https://github.com/uniVocity/univocity-parsers> library but the way it
handle is throw error that why spark stop the process.

How to skip the invalid row and just continue to parse next valid one?
Any libs can replace univocity in that job?

Thanks & regards,
Chanh
-- 
Regards,
Chanh

Reply via email to