[jira] [Created] (SPARK-11722) Rdds could be different between orginal one and save-out-then-read-in one

liangguoning (JIRA) Thu, 12 Nov 2015 21:19:05 -0800

liangguoning created SPARK-11722:
------------------------------------

             Summary: Rdds could be different between orginal one and 
save-out-then-read-in one
                 Key: SPARK-11722
                 URL: https://issues.apache.org/jira/browse/SPARK-11722
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.5.1
         Environment: redhat6.4  64bit;   standalone-cluster ; 3 machines
            Reporter: liangguoning



I found a bug on pyspark;
I did some operations to create a rdd  A,  but I found the data are different 
between that orginal A  and the saved_to_hdfs's  one, called B,
I also printed all detail data inside my function and discovered that A indeed 
contains a different one record from B.
That record causes a different result under the same functions. 
I got B  through 2 methods : A.saveAsTextFile  and  sc.textFile
I also check the raw data, and found that B is the right rdd. 
---
I tried another A2 through sc.parallelize(A.collect()) and got the same result 
as A.
Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-11722) Rdds could be different between orginal one and save-out-then-read-in one

Reply via email to