Szehon Ho created SPARK-54812:
---------------------------------
Summary: Make executable commands not execute on df.cache()
Key: SPARK-54812
URL: https://issues.apache.org/jira/browse/SPARK-54812
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.0
Reporter: Szehon Ho
Follow up of SPARK-52312. That JIRA changed V2WriteCommand not to execute
eagerly on df.cache(). However, there are a bunch of other commands that do.
The problem is, the existing behavior already executes eagerly on a call to
df.cache(). In some cases, we are lucky and the command, like for example
DescribeTableExec, has a in-memory reference to Table object and keeps the old
result despite repeated execution. However, others do not, for example V1
commands that only keep the table identifier and hit the catalog on every
execution.
To minimize backward compatibility issue, I make a new interface UsesCachedData
to keep the existing behavior, but now make all Commands by default bypass the
CacheManager
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]