[
https://issues.apache.org/jira/browse/CALCITE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419243#comment-17419243
]
Shlok Srivastava commented on CALCITE-4292:
-------------------------------------------
[~zabetak]
In-case of Not_Equals condition, the default behaviour of ElasticSearch as well
as JSON path is to select records in which the mentioned field is missing.
Adding additional EXISTS condition in NOT_EQUALS criteria deviates from
ElasticSearch behaviour. As the adapter is for ElasticSearch it should support
ES behaviour. If someone requires exists along with NO_EQUALS condition it can
be explicitly added in where condition but it can't be removed unless the code
is fixed.
So, this defect should be fixed to support default ElasticSearch behavior.
> Wrong results in ElasticSearch when query contains NOT EQUAL
> ------------------------------------------------------------
>
> Key: CALCITE-4292
> URL: https://issues.apache.org/jira/browse/CALCITE-4292
> Project: Calcite
> Issue Type: Bug
> Components: elasticsearch-adapter
> Reporter: Shlok Srivastava
> Assignee: Bill Neil
> Priority: Major
> Labels: ElasticSearch, NotEquals, QueryBuilder, calcite,
> pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently elasticsearch adapters ignore records which do not have the field
> specified in NotEqual clause, which is ideally included by elasticsearch.
> *Query:*
> {code:java}
> SELECT* from zips WHERE name <> "NMAX"{code}
> The elasticsearch query formed for above is this :
> {code:java}
> {
> "query": {
> "constant_score": {
> "filter": {
> "bool": {
> "must": {
> "exists": {
> "field": "Name"
> }
> },
> "must_not": {
> "term": {
> "Name": "NMQAX"
> }
> }
> }
> }
> }
> }
> }
> {code}
> *Problem* : The above query ignores document which do not have _Name_ field
> which is ideally included by elasticsearch but ignored due to must exists
> condition.
> *Data:*
> {noformat}
> { "_id" : "01701", "name" : "NMAX", "loc" : [ -71.42548600000001, 42.300665
> ], "pop" : 65046, "state" : "MA" }
> { "_id" : "02154", "name" : "NORTH WALTHAM", "loc" : [ -71.236497, 42.382492
> ], "pop" : 57871, "state" : "MA" }
> { "_id" : "02401 , "loc" : [ -71.03434799999999, 42.081571 ], "pop" : 59498,
> "state" : "MA" }
> {noformat}
>
>
> *Expected result:*
> {noformat}
> { "_id" : "02154", "name" : "NORTH WALTHAM", "loc" : [ -71.236497, 42.382492
> ], "pop" : 57871, "state" : "MA" }
> { "_id" : "02401", "loc" : [ -71.03434799999999, 42.081571 ], "pop" : 59498,
> "state" : "MA" }
> {noformat}
>
> *Current Result:*
> {noformat}
> { "_id" : "02154", "name" : "NORTH WALTHAM", "loc" : [ -71.236497, 42.382492
> ], "pop" : 57871, "state" : "MA" }
> {noformat}
> RelNode for same -
> {code:java}
> relB.not(relB.equals(relb.literal("Name"),relb.literal"NMQAX")){code}
>
> *Solution* : Remove the exists condition from Not equals Query Expression.
> Elasticsearch doesn't put this condition therefore keeping queries in sync.
> [Code|https://github.com/apache/calcite/blob/1050b36cafbb0c487b7a2ade3efd12850609717e/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/PredicateAnalyzer.java#L782]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)