I don’t know your full source code but you may missing an action so that it is indeed persisted.
> Am 16.09.2019 um 02:07 schrieb grp <gpete...@villanova.edu>: > > Hi There Spark Users, > > Curious what is going on here. Not sure if possible bug or missing > something. Extra eyes are much appreciated. > > Spark UI (Python API 2.4.3) by default is reporting persisted data-frames to > be de-serialized MEMORY_AND_DISK however I always thought they were > serialized for Python by default according to official documentation. > However when explicitly changing the storage level to default … ex => > df.persist(StorageLevel.MEMORY_AND_DISK) … the Spark UI returns the expected > serialized data-frame under Storage Tab, but not when just calling … > df.cache(). > > Do we have to explicitly set to … StorageLevel.MEMORY_AND_DISK … to get the > serialized benefit in Python (which I thought was automatic)? Or is the > Spark UI incorrect? > > SO post with specific example/details => > https://stackoverflow.com/questions/56926337/conflicting-pyspark-storage-level-defaults > > Thank you for your time and research! > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org