Hyukjin Kwon created SPARK-14260:
------------------------------------

             Summary: Increase default value for maxCharsPerColumn
                 Key: SPARK-14260
                 URL: https://issues.apache.org/jira/browse/SPARK-14260
             Project: Spark
          Issue Type: Sub-task
            Reporter: Hyukjin Kwon
            Priority: Trivial


I guess the default value of the option {{maxCharsPerColumn}} looks relatively 
small,1000000 characters meaning 976KB.

It looks some of guys have a problem with this ending up setting the value 
manually.

https://github.com/databricks/spark-csv/issues/295
https://issues.apache.org/jira/browse/SPARK-14103

According to [univocity 
API|http://docs.univocity.com/parsers/2.0.0/com/univocity/parsers/common/CommonSettings.html#setMaxCharsPerColumn(int)],
 this exists to avoid {{OutOfMemoryErrors}}.

If this does not harm performance, then I think it would be better to make the 
default value much bigger (eg. 10MB or 100MB) so that users do not take care of 
the lengths of each field in CSV file.

Apparently Apache CSV Parser does not have such limits.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to