[
https://issues.apache.org/jira/browse/SPARK-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049194#comment-15049194
]
Irakli Machabeli commented on SPARK-12218:
------------------------------------------
{code}
scala> val df = sqlContext.read.parquet(pathOne).where("c < 6 and not (a = 2
and b in ('1', '2'))")
df: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int]
scala> df.explain(true)
{code}
{noformat}
== Parsed Logical Plan ==
'Filter (('c < 6) && NOT (('a = 2) && 'b IN (1,2)))
Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]
== Analyzed Logical Plan ==
a: int, b: string, c: int
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]
== Optimized Logical Plan ==
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
Relation[a#30,b#31,c#32] ParquetRelation[file:/D:/tmp/test]
== Physical Plan ==
Filter ((c#32 < 6) && NOT ((a#30 = 2) && b#31 IN (1,2)))
Scan ParquetRelation[file:/D:/tmp/test][a#30,b#31,c#32]
Code Generation: true
{noformat}
{code}
scala> val df2 = sqlContext.read.parquet(pathOne).where("c < 6 and (not(a = 2)
or not(b in ('1', '2')))")
df2: org.apache.spark.sql.DataFrame = [a: int, b: string, c: int]
scala> df2.explain(true)
{code}
{noformat}
== Parsed Logical Plan ==
'Filter (('c < 6) && (NOT ('a = 2) || NOT 'b IN (1,2)))
Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]
== Analyzed Logical Plan ==
a: int, b: string, c: int
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]
== Optimized Logical Plan ==
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
Relation[a#34,b#35,c#36] ParquetRelation[file:/D:/tmp/test]
== Physical Plan ==
Filter ((c#36 < 6) && (NOT (a#34 = 2) || NOT b#35 IN (1,2)))
Scan ParquetRelation[file:/D:/tmp/test][a#34,b#35,c#36]
Code Generation: true
{noformat}
> Boolean logic in sql does not work "not (A and B)" is not the same as "(not
> A) or (not B)"
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-12218
> URL: https://issues.apache.org/jira/browse/SPARK-12218
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Irakli Machabeli
> Priority: Blocker
>
> Two identical queries produce different results
> In [2]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and not(
> PaymentsReceived=0 and ExplicitRoll in ('PreviouslyPaidOff',
> 'PreviouslyChargedOff'))").count()
> Out[2]: 18
> In [3]: sqlContext.read.parquet('prp_enh1').where(" LoanID=62231 and (
> not(PaymentsReceived=0) or not (ExplicitRoll in ('PreviouslyPaidOff',
> 'PreviouslyChargedOff')))").count()
> Out[3]: 28
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]