[ https://issues.apache.org/jira/browse/SPARK-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205994#comment-15205994 ]
Jian Chen commented on SPARK-14064: ----------------------------------- I should sort the key before zipWithIndex as the memory occupied by the dataID is released and zipWithIndex can't ensure the same result for every execution > count method of RDD doesn't take action > --------------------------------------- > > Key: SPARK-14064 > URL: https://issues.apache.org/jira/browse/SPARK-14064 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.1 > Environment: CentOS-6.1 > Reporter: Jian Chen > Fix For: 1.6.1 > > Original Estimate: 1h > Remaining Estimate: 1h > > I have some unique keys stored as RDD[Int] , then I use zipWithIndex to give > an unique ID to every key. > val dataID = data.zipWithIndex() > Then I count the num : > dataID.count > At last, I save the dataID as textFile to HDFS. > I save the data for three times as d1, d2 ,d2, but I find each result is > different. > dataID.saveAsTextFile("d1") > dataID.saveAsTextFile("d2") > dataID.saveAsTextFile("d3") > For example > The key 13552359 has an ID 187480 in d1, but has an another ID 187483 in d2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org