[
https://issues.apache.org/jira/browse/SPARK-30421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022719#comment-17022719
]
Tobias Hermann commented on SPARK-30421:
----------------------------------------
[~dongjoon] No, that's different. To make it equivalent, you'd have to change
your example to the following:
{quote}import pandas as pd
df = pd.DataFrame(data=\{'foo': [0, 1], 'bar': ["a", "b"]})
df2 = df.drop(columns=["bar"])
df2[df2["bar"] == "a"]
{quote}
And that correctly results in
{quote}KeyError: 'bar'
{quote}
In Spark, however, the following code works without error:
{quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar")
val df2 = df.drop("bar")
df2.where($"bar" === "a").show
{quote}
> Dropped columns still available for filtering
> ---------------------------------------------
>
> Key: SPARK-30421
> URL: https://issues.apache.org/jira/browse/SPARK-30421
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.4
> Reporter: Tobias Hermann
> Priority: Minor
>
> The following minimal example:
> {quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar")
> df.select("foo").where($"bar" === "a").show
> df.drop("bar").where($"bar" === "a").show
> {quote}
> should result in an error like the following:
> {quote}org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given
> input columns: [foo];
> {quote}
> However, it does not but instead works without error, as if the column "bar"
> would exist.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]