[jira] [Updated] (SPARK-26906) Pyspark RDD Replication Potentially Not Working

Han Altae-Tran (JIRA) Sat, 16 Feb 2019 22:49:36 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-26906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Han Altae-Tran updated SPARK-26906:
-----------------------------------
    Attachment: spark_ui.png

> Pyspark RDD Replication Potentially Not Working
> -----------------------------------------------
>
>                 Key: SPARK-26906
>                 URL: https://issues.apache.org/jira/browse/SPARK-26906
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Web UI
>    Affects Versions: 2.3.2
>         Environment: I am using Google Cloud's Dataproc version [1.3.19-deb9 
> 2018/12/14|https://cloud.google.com/dataproc/docs/release-notes#december_14_2018]
>  (version 2.3.2 Spark and version 2.9.0 Hadoop) with version Debian 9, with 
> python version 3.7. PySpark shell is activated using pyspark --num-executors 
> = 100
>            Reporter: Han Altae-Tran
>            Priority: Minor
>         Attachments: spark_ui.png
>
>
> Pyspark RDD replication doesn't seem to be functioning properly. Even with a 
> simple example, the UI reports only 1x replication, despite using the flag 
> for 2x replication
> {code:java}
> rdd = sc.range(10**9)
> mapped = rdd.map(lambda x: x)
> mapped.persist(pyspark.StorageLevel.DISK_ONLY_2) \\ PythonRDD[1] at RDD at 
> PythonRDD.scala:52
> mapped.count(){code}
>  
> Interestingly, if you catch the UI page at just the right time, you see that 
> it starts off 2x replicated, but ends up 1x replicated afterward. Perhaps the 
> RDD is replicated, but it is just the UI that is unable to register this.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-26906) Pyspark RDD Replication Potentially Not Working

Reply via email to