Re: [FastBit-users] possible bug in query

K. John Wu Fri, 26 Feb 2016 16:05:14 -0800

When you see "0.35135729403", there are a trailing digits now shown.
The double values have 16-digit precision, typically, if you print it
with 16-digit prevision or more, you might see something like


0.351357294031925

or

0.3513572940256878

The value in string "value1=0.35135729403" might be translated to
0.351357294029999989 internally (just checked with a C program
attached).  The point is that it will be very hard to tell if another
value printed as 0.35135729403 is actually 0.351357294029999989 and
therefore satisfying the condition "value1=0.35135729403".

I hope your query could allow you an alternative form that is more
precise than using floating-point values.  For example you could
convert the values into some sort of integer representation.  Or you
might consider expressing your query in the form of a range.  Treating
numbers as strings should be a last resort because the searches with
strings are typically much slower than searching with numbers.

John



On 2/26/16 3:47 PM, Sean Davey wrote:
> Hi John and Justin,
> 
> thanks for your help on this. I understand that the double version of
> the string 0.35135729403 will mostly likely not be exactly
> 0.35135729403. What I was hoping is that whatever the double value
> turned out to be would be consistent between indexing and querying. I
> don’t need the query to compare the exact right double value, I just
> need it to match whatever the double value ended up being when the
> data was indexed, i.e. I need to retrieve the rows that had the
> original text values of 0.35135729403 with a query of “where
> value1=0.35135729403”.
> 
> thanks for the suggestions. indexing the column twice, once as a
> double and once as a string, seems like it might be the way to go.
> 
> Sean Davey
> Bio5 Institute, University of Arizona, Tucson
> [email protected] <mailto:[email protected]>
> 
> 
> 
> 
>> On Feb 26, 2016, at 4:37 PM, K. John Wu <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi, Sean and Justin,
>>
>> The incoming floating-point values to FastBit are processed as
>> doubles, however, the string value of 0.35135729403 when converted to
>> double in memory, it will not match exactly with 0.35135729403. This
>> is because the internal representation of numbers are in binary and
>> decimal values like 0.1 does not have a compact binary representation.
>>
>> The output line
>>
>> From csv Where value1 == 0.351357 -->   0
>>
>> is printed with standard C++ output function which prints out
>> floating-point values with 6 significant digits.  It is not an
>> indication of the internal representation.
>>
>> Double precision values has 16 significant digits.
>>
>> To ensure the machine representation is exactly what you specify, give
>> a number that is representable in binary, e.g. 1, 0.125, and 0.03125.
>>
>> Another alternative is to ask for values in a range, e.g., "0.351357
>> <= value1 < 0.351358."
>> Keep in mind that the value 0.351357 and 0.351358 are not exactly
>> representable in binary values, and therefore whatever internal
>> representations might not be exactly what you are looking for.
>> Furthermore, because the printed values with 6-digit precision are
>> rounded, so you might have values that are printed as 0.351357 but are
>> not included in the query results.
>>
>> Hope this helps.
>>
>> John
>>
>>
>>
>>
>> On 2/26/16 3:12 PM, Justin Swanhart wrote:
>>> Hi,
>>>
>>> Double is just the double IEEE representation.  Even though you see
>>> the exactly value, and you loaded that value, rounding error occurs in
>>> different places.  From the looks of it, the input is probably
>>> rounding to single precision, but the general case for equality of a
>>> float is that it doesn't work, so it doesn't really make sense (to me)
>>> to change that as it would not fix things generally.  Floats are
>>> imprecise and even when you search for the displayed value the
>>> internal value might not be the same.
>>>
>>> John can comment if he feels it is a bug.  A workaround would be to
>>> use bc or gmp to store a fixed representation as a binary string, then
>>> search on that for equality (or just store the float as string).  You
>>> will still need to store the "raw" float value as well for range
>>> searches using fastbit because fastbit does't understand how to use
>>> that fixed width data.  Storing and using bc/gmp for fixed precision
>>> would be a nice extension to fastbit, but i personally don't use it
>>> enough (I just provide a MySQL interface to it) to make taking the
>>> time to make such a such worthwhile to me.
>>>
>>> --Justin
>>>
>>> --Justin
>>>
>>>
>>>
>>> On Fri, Feb 26, 2016 at 1:37 PM, Sean Davey
>>> <[email protected] <mailto:[email protected]>
>>> <mailto:[email protected]>> wrote:
>>>
>>>    Hi Justin,
>>>
>>>    thanks for the reply. I see what you’re saying but I thought it
>>>    might work in this case because I’m searching for a value I know
>>>    is in the data. The text I get back from "select min(value1)” is
>>>    0.35135729403. That’s the exact same text that is in the original
>>>    file that was indexed. So shouldn’t the internal representation of
>>>    that string be the same for indexing and for querying? If I’m
>>>    querying for a value with the exact same text that was used when
>>>    the data was indexed, shouldn’t it be found?
>>>
>>>    btw, the value1 column is a double, not a float. I don’t know if
>>>    that matters, but I thought of that when I noticed that the output
>>>    from ibis includes the line "From csv Where value1 == 0.351357 -->
>>>    0” which for some reason doesn’t display all of 0.35135729403.
>>>
>>>    cheers,
>>>    Sean Davey
>>>    Bio5 Institute, University of Arizona, Tucson
>>>    [email protected]
>>> <mailto:[email protected]> <mailto:[email protected]>
>>>
>>>
>>>
>>>
>>>>    On Feb 25, 2016, at 4:46 PM, Justin Swanhart
>>>>    <[email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>
>>>>    Hi,
>>>>
>>>>    It is not a bug.  Equality on floats/doubles won't work because
>>>>    they are IEEE float values and the displayed value may not be
>>>>    the same (read is not usually the same) as the internally stored
>>>>    value.  This is a problem common to all databases that use IEEE
>>>>    values to store floating point numbers.  Many databases offer a
>>>>    fixed point data type to work around this (in MySQL it is the
>>>>    DECIMAL type) but FastBit doesn't have such a data type.  
>>>>
>>>>    --Justin
>>>>
>>>>    On Wed, Feb 24, 2016 at 1:23 PM, Sean Davey
>>>>    <[email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>>
>>>> wrote:
>>>>
>>>>        hi all,
>>>>
>>>>        I’m trying to find multiple min or max values in my index. I
>>>>        do a query like “select min(value1)” which works fine and
>>>>        returns a value such as 0.35135729403, which is correct.
>>>>        However, when I try to find all the lines with that value
>>>>        with a query like “select chr,start,stop,value1 where
>>>>        value1=0.35135729403”, I get zero hits. In the output of the
>>>>        second query I see the line "From csv Where value1 ==
>>>>        0.351357 --> 0” so it appears that the value I’m searching
>>>>        for has been truncated.
>>>>
>>>>        Please let me know if this is a bug and if so, if it can be
>>>>        fixed.
>>>>
>>>>        thanks,
>>>>        Sean Davey
>>>>        Bio5 Institute, University of Arizona, Tucson
>>>>        [email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>
>>>>
>>>>        _______________________________________________
>>>>        FastBit-users mailing list
>>>>        [email protected]
>>>> <mailto:[email protected]>
>>>>        <mailto:[email protected]>
>>>>        https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
>>>>
>>>>    _______________________________________________
>>>>    FastBit-users mailing list
>>>>    [email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>
>>>>    https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>>
>>>    _______________________________________________
>>>    FastBit-users mailing list
>>>    [email protected]
>>> <mailto:[email protected]> <mailto:[email protected]>
>>>    https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected] <mailto:[email protected]>
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>

#include <stdio.h>
int main() {
    double val1 = 0.35135729403;
    printf("%.18G\n", val1);
    return 0;
}

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] possible bug in query

Reply via email to