Szehon Ho created SPARK-54812:
---------------------------------

             Summary: Make executable commands not execute on df.cache()
                 Key: SPARK-54812
                 URL: https://issues.apache.org/jira/browse/SPARK-54812
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.0
            Reporter: Szehon Ho


Follow up of SPARK-52312.  That JIRA changed V2WriteCommand not to execute 
eagerly on df.cache().  However, there are a bunch of other commands that do.

The problem is, the existing behavior already executes eagerly on a call to 
df.cache().  In some cases, we are lucky and the command, like for example 
DescribeTableExec, has a in-memory reference to Table object and keeps the old 
result despite repeated execution.  However, others do not, for example V1 
commands that only keep the table identifier and hit the catalog on every 
execution.

To minimize backward compatibility issue, I make a new interface UsesCachedData 
to keep the existing behavior, but now make all Commands by default bypass the 
CacheManager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to