[jira] [Commented] (SPARK-16102) Use Record API from Univocity rather than current data cast API.

Hyukjin Kwon (JIRA) Sun, 12 Mar 2017 04:45:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906500#comment-15906500
 ]


Hyukjin Kwon commented on SPARK-16102:
--------------------------------------

Hm, let me resolve this as a {{Not A Problem}}. It seems the key are both 
performance and reducing codes.

For performance, I read related codes in 
https://github.com/uniVocity/univocity-parsers/tree/master/src/main/java/com/univocity/parsers/conversions
 and it seems the casting logics are not particularly better. For example, it 
uses {{Double.valueOf(input)}} whereas our codes use {{StringLike.toDouble}} 
which seems an alias of {{java.lang.Double.parseDouble(toString)}}. It seems 
basically {{Double.valueOf(input)}} calls 
{{java.lang.Double.parseDouble(toString)}}.

For reducing codes, if we could use the default conversions in Univocity, it 
might be worth trying but we have some varints and option related casting 
logics introduced in CSV from the first day. Nevertheless, I tried to proceed 
this within my local but it really does not look worth and reducing the codes.

Please reopen if anyone sees a benefit from this try. This was just my humble 
opinion.

> Use Record API from Univocity rather than current data cast API.
> ----------------------------------------------------------------
>
>                 Key: SPARK-16102
>                 URL: https://issues.apache.org/jira/browse/SPARK-16102
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>
> There is Record API for Univocity parser.
> This API provides typed data. Spark currently tries to compare and cast each 
> data.
> Using this library should reduce the codes in Spark and maybe improve the 
> performance. 
> It seems a benchmark should be proceeded first.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16102) Use Record API from Univocity rather than current data cast API.

Reply via email to