[ https://issues.apache.org/jira/browse/SPARK-27645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835294#comment-16835294 ]
Hyukjin Kwon commented on SPARK-27645: -------------------------------------- You can wrap {{Dataset}} to have the pre-calculated count, or take an argument {{count}}. Or you can have a class that holds the count. > Cache result of count function to that RDD > ------------------------------------------ > > Key: SPARK-27645 > URL: https://issues.apache.org/jira/browse/SPARK-27645 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 2.4.3 > Reporter: Seungmin Lee > Priority: Major > > I'm not sure whether there have been an update for this(as far as I know, > there isn't such feature), since RDD is immutable, why don't we keep the > result from count function of that RDD and reuse it in future calls? > Sometimes, we only have RDD variable but don't have previously run result > from count. > In this case, not running whole count action to entire dataset would be very > beneficial in terms of performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org