[
https://issues.apache.org/jira/browse/SPARK-23463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manan Bakshi resolved SPARK-23463.
----------------------------------
Resolution: Not A Problem
> Filter operation fails to handle blank values and evicts rows that even
> satisfy the filtering condition
> -------------------------------------------------------------------------------------------------------
>
> Key: SPARK-23463
> URL: https://issues.apache.org/jira/browse/SPARK-23463
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.2.1
> Reporter: Manan Bakshi
> Priority: Critical
> Attachments: sample
>
>
> Filter operations were updated in Spark 2.2.0. Cost Based Optimizer was
> introduced to look at the table stats and decide filter selectivity. However,
> since then, filter has started behaving unexpectedly for blank values. The
> operation would not only drop columns with blank values but also filter out
> rows that actually meet the filter criteria.
> Steps to repro
> Consider a simple dataframe with some blank values as below:
> ||dev||val||
> |ALL|0.01|
> |ALL|0.02|
> |ALL|0.004|
> |ALL| |
> |ALL|2.5|
> |ALL|4.5|
> |ALL|45|
> Running a simple filter operation over val column in this dataframe yields
> unexpected results. For eg. the following query returned an empty dataframe:
> df.filter(df["val"] > 0)
> ||dev||val||
> However, the filter operation works as expected if 0 in filter condition is
> replaced by float 0.0
> df.filter(df["val"] > 0.0)
> ||dev||val||
> |ALL|0.01|
> |ALL|0.02|
> |ALL|0.004|
> |ALL|2.5|
> |ALL|4.5|
> |ALL|45|
>
> Note that this bug only exists in Spark 2.2.0 and later. The previous
> versions filter as expected for both int (0) and float (0.0) values in the
> filter condition.
> Also, if there are no blank values, the filter operation works as expected
> for all versions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]