[ 
https://issues.apache.org/jira/browse/SPARK-23463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manan Bakshi updated SPARK-23463:
---------------------------------
    Description: 
I have a simple dataframe with some blank values as below
||dev||val||
|ALL|0.01|
|ALL|0.02|
|ALL|0.004|
|ALL| |
|ALL|2.5|
|ALL|4.5|
|ALL|45|

Running a simple filter operation over val column in this dataframe yields 
unexpected results. For eg. the following query returned an empty dataframe:

df.filter(df["val"] > 0)
||dev||val||

However, the filter operation works as expected if 0 in filter condition is 
replaced by float 0.0

df.filter(df["val"] > 0.0)
||dev||val||
|ALL|0.01|
|ALL|0.02|
|ALL|0.004|
|ALL|2.5|
|ALL|4.5|
|ALL|45|

 

Note that this bug only exists in Spark 2.2.0 and later. The previous versions 
filter as expected for both int (0) and float (0.0) values in the filter 
condition.

Also, the filter operation works as expected for all versions, if there are no 
blank values.

  was:
I have a simple dataframe as below
||dev||val||
|ALL|0.01|
|ALL|0.02|
|ALL|0.004|
|ALL| |
|ALL|2.5|
|ALL|4.5|
|ALL|45|

Running a simple filter operation over val column in this dataframe yields 
unexpected results. For eg. the following query returned an empty dataframe:

df.filter(df["val"] > 0)
||dev||val||

However, the filter operation works as expected if 0 in filter condition is 
replaced by float 0.0

df.filter(df["val"] > 0.0)
||dev||val||
|ALL|0.01|
|ALL|0.02|
|ALL|0.004|
|ALL|2.5|
|ALL|4.5|
|ALL|45|

 

Note that this bug only exists in Spark 2.2.0 and later. The previous versions 
filter as expected for both int (0) and float (0.0) values in the filter 
condition.


> Filter operation fails to handle blank values and evicts rows that even 
> satisfy the filtering condition
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23463
>                 URL: https://issues.apache.org/jira/browse/SPARK-23463
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>            Reporter: Manan Bakshi
>            Priority: Critical
>
> I have a simple dataframe with some blank values as below
> ||dev||val||
> |ALL|0.01|
> |ALL|0.02|
> |ALL|0.004|
> |ALL| |
> |ALL|2.5|
> |ALL|4.5|
> |ALL|45|
> Running a simple filter operation over val column in this dataframe yields 
> unexpected results. For eg. the following query returned an empty dataframe:
> df.filter(df["val"] > 0)
> ||dev||val||
> However, the filter operation works as expected if 0 in filter condition is 
> replaced by float 0.0
> df.filter(df["val"] > 0.0)
> ||dev||val||
> |ALL|0.01|
> |ALL|0.02|
> |ALL|0.004|
> |ALL|2.5|
> |ALL|4.5|
> |ALL|45|
>  
> Note that this bug only exists in Spark 2.2.0 and later. The previous 
> versions filter as expected for both int (0) and float (0.0) values in the 
> filter condition.
> Also, the filter operation works as expected for all versions, if there are 
> no blank values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to