Re: Conflicting PySpark Storage Level Defaults?

Jörn Franke Mon, 16 Sep 2019 00:02:33 -0700

I don’t know your full source code but you may missing an action so that it is 
indeed persisted.


> Am 16.09.2019 um 02:07 schrieb grp <gpete...@villanova.edu>:
> 
> Hi There Spark Users,
> 
> Curious what is going on here.  Not sure if possible bug or missing 
> something.  Extra eyes are much appreciated.
> 
> Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to 
> be de-serialized MEMORY_AND_DISK however I always thought they were 
> serialized for Python by default according to official documentation.
> However when explicitly changing the storage level to default … ex => 
> df.persist(StorageLevel.MEMORY_AND_DISK) … the Spark UI returns the expected 
> serialized data-frame under Storage Tab, but not when just calling … 
> df.cache().
> 
> Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the 
> serialized benefit in Python (which I thought was automatic)?  Or is the 
> Spark UI incorrect?
> 
> SO post with specific example/details => 
> https://stackoverflow.com/questions/56926337/conflicting-pyspark-storage-level-defaults
> 
> Thank you for your time and research!
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Conflicting PySpark Storage Level Defaults?

Reply via email to