[ 
https://issues.apache.org/jira/browse/SPARK-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218019#comment-17218019
 ] 

Aoyuan Liao commented on SPARK-25985:
-------------------------------------

[~smilegator] I think recacheByCondition doesn't keep the cached plan. The 
following test would fail:
{code:java}
//
test("SPARK-24613 Cache with UDF could not be matched with subsequent dependent 
caches") {
    val udf1 = udf({x: Int => x + 1})    
    val df = spark.range(0, 10).toDF("a").withColumn("b", udf1($"a"))
    val df2 = df.agg(sum(df("b")))
    df.cache()
    df.count()
    df2.cache()

    df.unpersist() //recacheByCondition called within

    val plan = df2.queryExecution.withCachedData
    assert(plan.isInstanceOf[InMemoryRelation])
    
    val internalPlan = 
plan.asInstanceOf[InMemoryRelation].cacheBuilder.cachedPlan
    assert(internalPlan.find(_.isInstanceOf[InMemoryTableScanExec]).isDefined)
}
{code}
The second assertion failed, which means that the data is cached while the plan 
not.

> Verify the SPARK-24613 Cache with UDF could not be matched with subsequent 
> dependent caches
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25985
>                 URL: https://issues.apache.org/jira/browse/SPARK-25985
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Priority: Major
>              Labels: starter
>
> Verify whether recacheByCondition works well when the cache data is with UDF. 
> This is a follow-up of https://github.com/apache/spark/pull/21602



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to