[jira] [Commented] (SPARK-27645) Cache result of count function to that RDD

Hyukjin Kwon (JIRA) Tue, 07 May 2019 21:42:13 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-27645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835294#comment-16835294
 ]


Hyukjin Kwon commented on SPARK-27645:
--------------------------------------

You can wrap {{Dataset}} to have the pre-calculated count, or take an argument 
{{count}}. Or you can have a class that holds the count.

> Cache result of count function to that RDD
> ------------------------------------------
>
>                 Key: SPARK-27645
>                 URL: https://issues.apache.org/jira/browse/SPARK-27645
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Seungmin Lee
>            Priority: Major
>
> I'm not sure whether there have been an update for this(as far as I know, 
> there isn't such feature), since RDD is immutable, why don't we keep the 
> result from count function of that RDD and reuse it in future calls?
> Sometimes, we only have RDD variable but don't have previously run result 
> from count.
> In this case, not running whole count action to entire dataset would be very 
> beneficial in terms of performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27645) Cache result of count function to that RDD

Reply via email to