[
https://issues.apache.org/jira/browse/SPARK-51473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950265#comment-17950265
]
Weichen Xu edited comment on SPARK-51473 at 5/8/25 2:32 PM:
------------------------------------------------------------
I found 2 issues:
(1)
{code:java}
model_a = estimator.fit(df1)
del df1
model_a.summary.XXX(...){code}
assuming df1 contains id of model-B
model-B is released after `del df1` execution.
but `model_a.summary` might still use model-B because the summary contains the
prediction dataframe.
The `model.evaluate` API has a similar issue.
(2)
{code:java}
df1 = model1.transform(df0)
df2 = model2.transform(df1)
del df1
del model1{code}
`df2` plan contains model1 and model2 IDs,
but model1 is already released.
[~podongfeng]
was (Author: weichenxu123):
I found an issue:
{code:java}
model_a = estimator.fit(df1)
del df1
model_a.summary.XXX(...){code}
assuming df1 contains id of model-B
model-B is released after `del df1` execution.
but `model_a.summary` might still use model-B because the summary contains the
prediction dataframe.
The `model.evaluate` API has a similar issue.
[~podongfeng]
> ML transformed dataframe keep a reference to the model
> ------------------------------------------------------
>
> Key: SPARK-51473
> URL: https://issues.apache.org/jira/browse/SPARK-51473
> Project: Spark
> Issue Type: Sub-task
> Components: Connect, ML
> Affects Versions: 4.1.0
> Reporter: Ruifeng Zheng
> Assignee: Ruifeng Zheng
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]