[ 
https://issues.apache.org/jira/browse/SPARK-51473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950265#comment-17950265
 ] 

Weichen Xu edited comment on SPARK-51473 at 5/8/25 2:32 PM:
------------------------------------------------------------

I found 2 issues:

(1)
{code:java}
model_a = estimator.fit(df1)
del df1

model_a.summary.XXX(...){code}
assuming df1 contains id of model-B

model-B is released after `del df1` execution.

but `model_a.summary` might still use model-B because the summary contains the 
prediction dataframe.

 

The `model.evaluate` API has a similar issue.

 

(2)
{code:java}
df1 = model1.transform(df0)
df2 = model2.transform(df1)

del df1
del model1{code}
 

`df2` plan contains model1 and model2 IDs, 

but model1 is already released.

 

[~podongfeng] 


was (Author: weichenxu123):
I found an issue:

 
{code:java}
model_a = estimator.fit(df1)
del df1

model_a.summary.XXX(...){code}
assuming df1 contains id of model-B

model-B is released after `del df1` execution.

but `model_a.summary` might still use model-B because the summary contains the 
prediction dataframe.

 

The `model.evaluate` API has a similar issue.

[~podongfeng] 

> ML transformed dataframe keep a reference to the model
> ------------------------------------------------------
>
>                 Key: SPARK-51473
>                 URL: https://issues.apache.org/jira/browse/SPARK-51473
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect, ML
>    Affects Versions: 4.1.0
>            Reporter: Ruifeng Zheng
>            Assignee: Ruifeng Zheng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to