[jira] [Commented] (SPARK-20268) Arbitrary RDD element (Fast return) instead of using first

Sean Owen (JIRA) Sat, 08 Apr 2017 21:52:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962037#comment-15962037
 ]


Sean Owen commented on SPARK-20268:
-----------------------------------

If any element will do, why not the first? I get that it might theoretically be 
faster to get any than specifically the first, but the first element is the 
fastest to retrieve in any partition and the first partition shouldn't be 
meaningfully slower to access than any other.
If you want to calculate the length and retrieve only the length, you can call 
map() and then first().
I don't think we should add a new method.

> Arbitrary RDD element (Fast return) instead of using first
> ----------------------------------------------------------
>
>                 Key: SPARK-20268
>                 URL: https://issues.apache.org/jira/browse/SPARK-20268
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, Spark Core
>    Affects Versions: 2.0.0, 2.0.1, 2.1.0
>            Reporter: Hayri Volkan Agun
>            Priority: Minor
>
> Most of the ML and MLLIB algorithms somehow need the column size of the rdd 
> vector (RDD[Vector]). So instead of getting the first element by rdd.first(), 
> a fast return can be made to calculate the length of the vector of a 
> arbitrary rdd element. It can also be be named any(). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20268) Arbitrary RDD element (Fast return) instead of using first

Reply via email to