albertogpz commented on pull request #6200:
URL: https://github.com/apache/geode/pull/6200#issuecomment-814286023
@agingade
Thanks for your comments. Please, let me answer inline to your questions.
> @albertogpz
> Thanks for your contributions to make the query engine robust. Sorry for
delay in responding...
>
> Here is what I believe should be happening.
>
> * The result for a query should be consistent in both using index or
non-index case.
>
This is what I tried to fix with this PR and the test new cases now show
that results are the same with and without indexes while prior to the PR, they
returned different results.
> * The query engine returns UNDEFINED when it is unable to find the
next level field.
> E.g: if address.city and address is null (This is documented).
> This is not same when you are looking for a non existing "key" in
the map; UNDEFINED needs to be returned when positions is null and query is
trying to access field from it.
> E.g.: positions['*'] should be returning UNDEFINED.
So, you mean if positions is not null but it only contains a mapping for the
'SUN" key, positions["ERIC"] should return null instead of UNDEFINED?
>
> * Query engine supports heterogenous objects stored in a region.
> E.g: Employee or Customer.
> Inline with supporting this, its designed/architected such that if a
field is not found in the object it will be ignored.
> E.g query with employeeID is not going to return customer objects
unless it has that field.
>
ok. That should not have been changed.
> * To be inline with the above design (query expectation), when a map
field is not present available it should ignore that entry/object from adding
to the result.
> E.g. if positions['SUN'] if SUN key is not present query should
ignore that object.
> This is also different from null check.
> If there is a SUN key with null value it should be returned for
queries looking for null value. And non null check will return if the key is
there and its value is not null.
Currently if you have an entry for which positions has the following
mappings: {"SUN" => null} and another entry for which positions has the
following mapping: {"ERICSSON" => "3"} querying for positions["SUN"] = null
will return both entries.
With my PR, only the first one would be returned. What should be the right
behavior?
> Try the query with non map field and the behavior should be same.
>
>
> Please let me know if you have any questions on the expected behavior.
>
> The overall behavior of the query on map should be in-line with non-map
fields.
>
> The usage of index should not be avoided; unless the results are
inconsistent with non-index query results. Instead of avoiding/blocking the use
of index, it will be good to address the issues with indexed queries and make
the behavior consistent.
>
I tried to support the use of indexes in all queries with my PR but I found
no way to do it with != queries using an index of type: positions[*] (index on
all keys) or positions["SUN", "ERICSSON"].
For the second type of index, I have an alternative solution in a draft PR
that will allow the use of the index although it is more costly in memory (see
https://github.com/apache/geode/pull/6238)
> Can you confirm above requirements are met in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]