[
https://issues.apache.org/jira/browse/SPARK-54730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruifeng Zheng updated SPARK-54730:
----------------------------------
Description:
some delta queries fail with 4.1+
{code:java}
df = spark.read.option("readChangeFeed", True).option("startingVersion",
0).table("sample_table")
df.select('_commit_version').show() <- expected output
df.select(df._commit_version).show() <- fail with
[CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column
"_commit_version". It's probably because of illegal references like
`df1.select(df2.col("a"))`. SQLSTATE: 42704{code}
{code}
It seems due to the lazy dataframe column resolution conflicts with delta rules
and cause rule order dependency issue.
was:
some delta queries fail with 4.1+
{code:java}
{code}
some delta queries fail with 4.1+
{code:java}
df = spark.read.option("readChangeFeed", True).option("startingVersion",
0).table("sample_table")
df.select('_commit_version').show() <- expected output
df.select(df._commit_version).show() <- fail with
[CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column
"_commit_version". It's probably because of illegal references like
`df1.select(df2.col("a"))`. SQLSTATE: 42704\{code}
It seems due to the lazy dataframe column resolution conflicts with delta rules
and cause rule order dependency issue.
> DataFrame Column Resolution conflicts with delta
> ------------------------------------------------
>
> Key: SPARK-54730
> URL: https://issues.apache.org/jira/browse/SPARK-54730
> Project: Spark
> Issue Type: Bug
> Components: Connect, SQL
> Affects Versions: 4.1.0, 4.2.0
> Reporter: Ruifeng Zheng
> Priority: Major
>
> some delta queries fail with 4.1+
> {code:java}
> df = spark.read.option("readChangeFeed", True).option("startingVersion",
> 0).table("sample_table")
>
> df.select('_commit_version').show() <- expected output
>
> df.select(df._commit_version).show() <- fail with
> [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column
> "_commit_version". It's probably because of illegal references like
> `df1.select(df2.col("a"))`. SQLSTATE: 42704{code}
> {code}
>
> It seems due to the lazy dataframe column resolution conflicts with delta
> rules and cause rule order dependency issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]