[ 
https://issues.apache.org/jira/browse/LUCENE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020582#comment-17020582
 ] 

Adrien Grand commented on LUCENE-9154:
--------------------------------------

I like the fact that the accuracy loss only occurs at index-time with the 
current approach, it makes things easier to reason about. That said I don't 
feel strongly about it and you may be right that it's less confusing for users 
to have false positives than true negatives. I wonder whether [~rcmuir ] has 
opinions on this since I think he is the one who introduced this logic.

> Remove encodeCeil()  to encode bounding box queries
> ---------------------------------------------------
>
>                 Key: LUCENE-9154
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9154
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently have the following logic in LatLonPoint#newBoxquery():
> {code:java}
>  // exact double values of lat=90.0D and lon=180.0D must be treated special 
> as they are not represented in the encoding
> // and should not drag in extra bogus junk! TODO: should encodeCeil just 
> throw ArithmeticException to be less trappy here?
> if (minLatitude == 90.0) {
>   // range cannot match as 90.0 can never exist
>   return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with 
> minLatitude=90.0");
> }
> if (minLongitude == 180.0) {
>   if (maxLongitude == 180.0) {
>     // range cannot match as 180.0 can never exist
>     return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with 
> minLongitude=maxLongitude=180.0");
>   } else if (maxLongitude < minLongitude) {
>     // encodeCeil() with dateline wrapping!
>     minLongitude = -180.0;
>   }
> }
> byte[] lower = encodeCeil(minLatitude, minLongitude);
> byte[] upper = encode(maxLatitude, maxLongitude);
> {code}
>  
> IMO opinion this is confusing and can lead to strange results. For example a 
> query with {{minLatitude = minLatitude = 90}} does not match points with 
> {{latitude = 90}}. On the other hand a query with {{minLatitude = 
> minLatitude}} = 89.99999996}} will match points at latitude = 90.
> I don't really understand the statement that says: {{90.0 can never exist}} 
> as this is as well true for values > 89.99999995809048 which is the maximum 
> quantize value. In this argument, this will be true for all values between 
> quantize coordinates as they do not exist in the index, why 90D is so 
> special? I guess because it cannot be ceil up without overflowing the 
> encoding.
> Another argument to remove this function is that it opens the room to have 
> false negatives in the result of the query. if a query has minLon = 
> 89.999999957, it won't match points with longitude = 89.999999957 as it is 
> rounded up to 89.99999995809048.
> The only merit I can see in the current approach is that if you only index 
> points that are already quantize, then all queries would be exact. But does 
> it make sense for someone to only index quantize values and then query by 
> non-quantize bounding boxes?
>  
> I hope I am missing something, but my proposal is to remove encodeCeil all 
> together and remove all the special handling at the positive pole and 
> positive dateline.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to