[jira] [Commented] (SPARK-44900) Cached DataFrame keeps growing

Yuexin Zhang (Jira) Mon, 28 Aug 2023 21:58:43 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759818#comment-17759818
 ]


Yuexin Zhang commented on SPARK-44900:
--------------------------------------

Hi [~varun2807]  [~yaud]  did you check the actual cached file size on disk, on 
the yarn node manager local filesystem?  Is it really ever growing?

> Cached DataFrame keeps growing
> ------------------------------
>
>                 Key: SPARK-44900
>                 URL: https://issues.apache.org/jira/browse/SPARK-44900
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Varun Nalla
>            Priority: Blocker
>
> Scenario :
> We have a kafka streaming application where the data lookups are happening by 
> joining  another DF which is cached, and the caching strategy is 
> MEMORY_AND_DISK.
> However the size of the cached DataFrame keeps on growing for every micro 
> batch the streaming application process and that's being visible under 
> storage tab.
> A similar stack overflow thread was already raised.
> https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44900) Cached DataFrame keeps growing

Reply via email to