[ 
https://issues.apache.org/jira/browse/SPARK-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-954.
-----------------------------
    Resolution: Won't Fix

>From the discussion, and later ones about guarantees of determinism in RDDs, 
>sounds like this is working as intended.

> One repeated sampling, and I am not sure if it is correct.
> ----------------------------------------------------------
>
>                 Key: SPARK-954
>                 URL: https://issues.apache.org/jira/browse/SPARK-954
>             Project: Spark
>          Issue Type: Story
>    Affects Versions: 0.7.3
>            Reporter: caizhua
>
> This piece of code reads the dataset, and then has two operations on the 
> dataset. If I consider the RDD as a view definition, I think the result is 
> correct. However, since the first iteration does result_sample.count(), then 
> I was wondering whether we should repeat the computation in the 
> initialize_doc_topic_word_count(.) function, when we run the the second 
> result_sample.map(lambda (block_id, doc_prob): doc_prob).count(). Since 
> people write Spark as a program not as a database view, sometimes it is 
> confusing. For example, considering there  initialize_doc_topic_word_count(.) 
>  is a statistical function with runtime seeds, I am not sure if this have 
> impact on the result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to