[
https://issues.apache.org/jira/browse/SPARK-54665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045267#comment-18045267
]
Tian Gao commented on SPARK-54665:
----------------------------------
I think this is by design? You explicitly turned off ansi mode which enables
the implicit conversion for spark itself. I don't think this is a bug.
> pandas-on-Spark Boolean vs String comparison yields inconsistent result with
> pandas
> -----------------------------------------------------------------------------------
>
> Key: SPARK-54665
> URL: https://issues.apache.org/jira/browse/SPARK-54665
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 4.0.1
> Environment: Platform: Ubuntu 24.04
> Linux-6.14.0-35-generic-x86_64-with-glibc2.39
> Python: 3.10.19 | packaged by conda-forge | (main, Oct 22 2025, 22:29:10)
> [GCC 14.3.0]
> openjdk version "17.0.17-internal" 2025-10-21
> OpenJDK Runtime Environment (build 17.0.17-internal+0-adhoc..src)
> OpenJDK 64-Bit Server VM (build 17.0.17-internal+0-adhoc..src, mixed mode,
> sharing)
> pyspark 4.0.1
> pandas 2.3.3
> pyarrow 22.0.0
> Reporter: asddfl
> Priority: Critical
>
> When using pandas-on-Spark (pyspark.pandas / pandas API on Spark), comparing
> a boolean Series with a string literal produces a result that is inconsistent
> with native pandas.
> This behavior diverges from pandas semantics and may cause silent logic
> differences when running pandas-compatible code on Spark.
> {code:python}
> import pandas as pd
> from pyspark.sql import SparkSession
> import pyspark.pandas as ps
> pd_t1 = pd.DataFrame(
> {
> 'c1': [True]
> }
> )
> print("Pandas:")
> print(pd_t1['c1'] == 'True')
> spark = (
> SparkSession.builder
> .config("spark.sql.ansi.enabled", "false")
> .getOrCreate()
> )
> ps_t1 = ps.DataFrame(
> {
> 'c1': [True]
> }
> )
> print("PySpark Pandas:")
> print(ps_t1['c1'] == 'True')
> {code}
> {code:bash}
> Pandas:
> 0 False
> Name: c1, dtype: bool
> PySpark Pandas:
> 0 True
>
> Name: c1, dtype: bool
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]