Is it not enough to set `maxColumns` in CSV options?

https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116

// maropu

On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> Spark CSV data source should be able
>
> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote:
>
> Hi everyone,
> I am using Spark 2.1.1 to read csv files and convert to avro files.
> One problem that I am facing is if one row of csv file has more columns
> than maxColumns (default is 20480). The process of parsing was stop.
>
> Internal state when error was thrown: line=1, column=3, record=0,
> charIndex=12
> com.univocity.parsers.common.TextParsingException: 
> java.lang.ArrayIndexOutOfBoundsException
> - 2
> Hint: Number of columns processed may have exceeded limit of 2 columns.
> Use settings.setMaxColumns(int) to define the maximum number of columns
> your input can have
> Ensure your configuration is correct, with delimiters, quotes and escape
> sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
>
>
> I did some investigation in univocity
> <https://github.com/uniVocity/univocity-parsers> library but the way it
> handle is throw error that why spark stop the process.
>
> How to skip the invalid row and just continue to parse next valid one?
> Any libs can replace univocity in that job?
>
> Thanks & regards,
> Chanh
> --
> Regards,
> Chanh
>
>


-- 
---
Takeshi Yamamuro

Reply via email to