[
https://issues.apache.org/jira/browse/SPARK-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906500#comment-15906500
]
Hyukjin Kwon commented on SPARK-16102:
--------------------------------------
Hm, let me resolve this as a {{Not A Problem}}. It seems the key are both
performance and reducing codes.
For performance, I read related codes in
https://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions
and it seems the casting logics are not particularly better. For example, it
uses {{Double.valueOf(input)}} whereas our codes use {{StringLike.toDouble}}
which seems an alias of {{java.lang.Double.parseDouble(toString)}}. It seems
basically {{Double.valueOf(input)}} calls
{{java.lang.Double.parseDouble(toString)}}.
For reducing codes, if we could use the default conversions in Univocity, it
might be worth trying but we have some varints and option related casting
logics introduced in CSV from the first day. Nevertheless, I tried to proceed
this within my local but it really does not look worth and reducing the codes.
Please reopen if anyone sees a benefit from this try. This was just my humble
opinion.
> Use Record API from Univocity rather than current data cast API.
> ----------------------------------------------------------------
>
> Key: SPARK-16102
> URL: https://issues.apache.org/jira/browse/SPARK-16102
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Hyukjin Kwon
>
> There is Record API for Univocity parser.
> This API provides typed data. Spark currently tries to compare and cast each
> data.
> Using this library should reduce the codes in Spark and maybe improve the
> performance.
> It seems a benchmark should be proceeded first.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]