[
https://issues.apache.org/jira/browse/SPARK-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217785#comment-15217785
]
Hyukjin Kwon edited comment on SPARK-14260 at 3/30/16 10:40 AM:
----------------------------------------------------------------
For me, yes, I think the error message should be more readable.
But I think the cases of failures while reading(input data bugs or something)
will be covered {{parseMode}} options.
was (Author: hyukjin.kwon):
For me, yes, I think the error message should be more readable.
I think the cases of failures while reading(input data bugs or something) will
be covered {{parseMode}} options.
> Increase default value for maxCharsPerColumn
> --------------------------------------------
>
> Key: SPARK-14260
> URL: https://issues.apache.org/jira/browse/SPARK-14260
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Hyukjin Kwon
> Priority: Trivial
>
> I guess the default value of the option {{maxCharsPerColumn}} looks
> relatively small,1000000 characters meaning 976KB.
> It looks some of guys have a problem with this ending up setting the value
> manually.
> https://github.com/databricks/spark-csv/issues/295
> https://issues.apache.org/jira/browse/SPARK-14103
> According to [univocity
> API|http://docs.univocity.com/parsers/2.0.0/com/univocity/parsers/common/CommonSettings.html#setMaxCharsPerColumn(int)],
> this exists to avoid {{OutOfMemoryErrors}}.
> If this does not harm performance, then I think it would be better to make
> the default value much bigger (eg. 10MB or 100MB) so that users do not take
> care of the lengths of each field in CSV file.
> Apparently Apache CSV Parser does not have such limits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]