Varun Nalla created SPARK-44900:
-----------------------------------

             Summary: Cached DataFrame keeps growing
                 Key: SPARK-44900
                 URL: https://issues.apache.org/jira/browse/SPARK-44900
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: Varun Nalla


Scenario :

We have a kafka streaming application where the data lookups are happening by 
joining  another DF which is cached, and the caching strategy is 
MEMORY_AND_DISK.

However the size of the cached DataFrame keeps on growing for every micro batch 
the streaming application process and that's being visible under storage tab.

A similar stack overflow thread was already raised.

https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to