[
https://issues.apache.org/jira/browse/SOLR-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya updated SOLR-8082:
---------------------------------------
Attachment: SOLR-8082.patch
Here's a summary of my understanding / observations:
# Floats and doubles need to be converted to longs before writing them to
NumericDocValues.
# We have two options, Double.doubleToLongBits() and
NumericUtils.doubleToSortableLong(). For positive doubles, both these methods
return the same long value, but different ones for negative doubles.
# Currently, we use Double.doubleToLongBits(). Hence, to use term query against
such docValues, we should use the same method with the query value, but current
code uses NumericUtils.doubleToSortableLong() and hence term queries against
negative values fail. Similarly, range queries also fail when min is negative.
# I tried changing initial writing logic to use
NumericUtils.doubleToSortableLong(). With this change, both term queries and
range queries work, but sorting fails (when there are negative values). That is
counter intuitive, since the individual long values themselves are in sorted
order. Since this is an intrusive change that breaks backcompat, I didn't
investigate deeper to understand why this is happening.
# To arrive at a least intrusive fix, I tried changing the range query logic to
split out the queries into two distinct ranges (negatives and positives) using
a boolean query. I had to do this since the Double.doubleToLongBits() values
are not monotonically increasing (they are decreasing for Double.MIN_VALUE to
0, but increasing for 0 to Double.MAX_VALUE).
Attached the patch for the last point, which I think is the least intrusive way
to pull things together so that they work. When the range query crosses the 0
boundary, there are two dv range queries which is less efficient, but better
than not working at all (which is the case today). The patch passes the tests,
but it might benefit from some neater refactoring.
[~hossman] Can you please review? Do you think there's a cleaner way to do this?
> can't query against negative float or double values when indexed="false"
> docValues="true" multiValued="false"
> -------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-8082
> URL: https://issues.apache.org/jira/browse/SOLR-8082
> Project: Solr
> Issue Type: Bug
> Reporter: Hoss Man
> Attachments: SOLR-8082.patch, SOLR-8082.patch
>
>
> Haven't dug into this yet, but something is evidently wrong in how the
> DocValues based queries get build for single valued float or double fields
> when negative numbers are involved.
> Steps to reproduce...
> {noformat}
> $ bin/solr -e schemaless -noprompt
> ...
> $ curl -X POST -H 'Content-type:application/json' --data-binary '{
> "add-field":{ "name":"f_dv_multi", "type":"tfloat", "stored":"true",
> "indexed":"false", "docValues":"true", "multiValued":"true" }, "add-field":{
> "name":"f_dv_single", "type":"tfloat", "stored":"true", "indexed":"false",
> "docValues":"true", "multiValued":"false" } }'
> http://localhost:8983/solr/gettingstarted/schema
> {
> "responseHeader":{
> "status":0,
> "QTime":84}}
> $ curl -X POST -H 'Content-type:application/json' --data-binary
> '[{"id":"test", "f_dv_multi":-4.3, "f_dv_single":-4.3}]'
> 'http://localhost:8983/solr/gettingstarted/update/json/docs?commit=true'
> {"responseHeader":{"status":0,"QTime":57}}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_multi:"-4.3"'
> {
> "responseHeader":{
> "status":0,
> "QTime":5,
> "params":{
> "q":"f_dv_multi:\"-4.3\""}},
> "response":{"numFound":1,"start":0,"docs":[
> {
> "id":"test",
> "f_dv_multi":[-4.3],
> "f_dv_single":-4.3,
> "_version_":1512962117004689408}]
> }}
> $ curl 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_single:"-4.3"'
> {
> "responseHeader":{
> "status":0,
> "QTime":5,
> "params":{
> "q":"f_dv_single:\"-4.3\""}},
> "response":{"numFound":0,"start":0,"docs":[]
> }}
> {noformat}
> Explicit range queries (which is how numeric "field" queries are implemented
> under the cover) are equally problematic...
> {noformat}
> $ curl
> 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_multi:%5B-4.3+TO+-4.3%5D'
> {
> "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
> "q":"f_dv_multi:[-4.3 TO -4.3]"}},
> "response":{"numFound":1,"start":0,"docs":[
> {
> "id":"test",
> "f_dv_multi":[-4.3],
> "f_dv_single":-4.3,
> "_version_":1512962117004689408}]
> }}
> $ curl
> 'http://localhost:8983/solr/gettingstarted/query?q=f_dv_single:%5B-4.3+TO+-4.3%5D'
> {
> "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
> "q":"f_dv_single:[-4.3 TO -4.3]"}},
> "response":{"numFound":0,"start":0,"docs":[]
> }}
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]