[ 
https://issues.apache.org/jira/browse/SPARK-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-14260.
-------------------------------
    Resolution: Won't Fix

Yeah I think that would be a very rare case. I also suggest we not increase the 
default limit. This was motivated I think by SPARK-14103 but I'm not sure the 
cause is a long line, not yet. (Or if it is, the solution is to raise the 
limit.)

> Increase default value for maxCharsPerColumn
> --------------------------------------------
>
>                 Key: SPARK-14260
>                 URL: https://issues.apache.org/jira/browse/SPARK-14260
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Hyukjin Kwon
>            Priority: Trivial
>
> I guess the default value of the option {{maxCharsPerColumn}} looks 
> relatively small,1000000 characters meaning 976KB.
> It looks some of guys have a problem with this ending up setting the value 
> manually.
> https://github.com/databricks/spark-csv/issues/295
> https://issues.apache.org/jira/browse/SPARK-14103
> According to [univocity 
> API|http://docs.univocity.com/parsers/2.0.0/com/univocity/parsers/common/CommonSettings.html#setMaxCharsPerColumn(int)],
>  this exists to avoid {{OutOfMemoryErrors}}.
> If this does not harm performance, then I think it would be better to make 
> the default value much bigger (eg. 10MB or 100MB) so that users do not take 
> care of the lengths of each field in CSV file.
> Apparently Apache CSV Parser does not have such limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to