The CSV data source allows you to skip invalid lines - this should also include 
lines that have more than maxColumns. Choose mode "DROPMALFORMED"

> On 8. Jun 2017, at 03:04, Chanh Le <giaosu...@gmail.com> wrote:
> 
> Hi Takeshi, Jörn Franke,
> 
> The problem is even I increase the maxColumns it still have some lines have 
> larger columns than the one I set and it will cost a lot of memory.
> So I just wanna skip the line has larger columns than the maxColumns I set.
> 
> Regards,
> Chanh
> 
> 
>> On Thu, Jun 8, 2017 at 12:48 AM Takeshi Yamamuro <linguin....@gmail.com> 
>> wrote:
>> Is it not enough to set `maxColumns` in CSV options?
>> 
>> https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116
>> 
>> // maropu
>> 
>>> On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>>> Spark CSV data source should be able
>>> 
>>>> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote:
>>>> 
>>>> Hi everyone,
>>>> I am using Spark 2.1.1 to read csv files and convert to avro files.
>>>> One problem that I am facing is if one row of csv file has more columns 
>>>> than maxColumns (default is 20480). The process of parsing was stop.
>>>> 
>>>> Internal state when error was thrown: line=1, column=3, record=0, 
>>>> charIndex=12
>>>> com.univocity.parsers.common.TextParsingException: 
>>>> java.lang.ArrayIndexOutOfBoundsException - 2
>>>> Hint: Number of columns processed may have exceeded limit of 2 columns. 
>>>> Use settings.setMaxColumns(int) to define the maximum number of columns 
>>>> your input can have
>>>> Ensure your configuration is correct, with delimiters, quotes and escape 
>>>> sequences that match the input format you are trying to parse
>>>> Parser Configuration: CsvParserSettings:
>>>> 
>>>> 
>>>> I did some investigation in univocity library but the way it handle is 
>>>> throw error that why spark stop the process.
>>>> 
>>>> How to skip the invalid row and just continue to parse next valid one?
>>>> Any libs can replace univocity in that job?
>>>> 
>>>> Thanks & regards,
>>>> Chanh
>>>> -- 
>>>> Regards,
>>>> Chanh
>> 
>> 
>> 
>> -- 
>> ---
>> Takeshi Yamamuro
> 
> -- 
> Regards,
> Chanh

Reply via email to