When you see "0.35135729403", there are a trailing digits now shown.
The double values have 16-digit precision, typically, if you print it
with 16-digit prevision or more, you might see something like
0.351357294031925
or
0.3513572940256878
The value in string "value1=0.35135729403" might be translated to
0.351357294029999989 internally (just checked with a C program
attached). The point is that it will be very hard to tell if another
value printed as 0.35135729403 is actually 0.351357294029999989 and
therefore satisfying the condition "value1=0.35135729403".
I hope your query could allow you an alternative form that is more
precise than using floating-point values. For example you could
convert the values into some sort of integer representation. Or you
might consider expressing your query in the form of a range. Treating
numbers as strings should be a last resort because the searches with
strings are typically much slower than searching with numbers.
John
On 2/26/16 3:47 PM, Sean Davey wrote:
> Hi John and Justin,
>
> thanks for your help on this. I understand that the double version of
> the string 0.35135729403 will mostly likely not be exactly
> 0.35135729403. What I was hoping is that whatever the double value
> turned out to be would be consistent between indexing and querying. I
> don’t need the query to compare the exact right double value, I just
> need it to match whatever the double value ended up being when the
> data was indexed, i.e. I need to retrieve the rows that had the
> original text values of 0.35135729403 with a query of “where
> value1=0.35135729403”.
>
> thanks for the suggestions. indexing the column twice, once as a
> double and once as a string, seems like it might be the way to go.
>
> Sean Davey
> Bio5 Institute, University of Arizona, Tucson
> [email protected] <mailto:[email protected]>
>
>
>
>
>> On Feb 26, 2016, at 4:37 PM, K. John Wu <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi, Sean and Justin,
>>
>> The incoming floating-point values to FastBit are processed as
>> doubles, however, the string value of 0.35135729403 when converted to
>> double in memory, it will not match exactly with 0.35135729403. This
>> is because the internal representation of numbers are in binary and
>> decimal values like 0.1 does not have a compact binary representation.
>>
>> The output line
>>
>> From csv Where value1 == 0.351357 --> 0
>>
>> is printed with standard C++ output function which prints out
>> floating-point values with 6 significant digits. It is not an
>> indication of the internal representation.
>>
>> Double precision values has 16 significant digits.
>>
>> To ensure the machine representation is exactly what you specify, give
>> a number that is representable in binary, e.g. 1, 0.125, and 0.03125.
>>
>> Another alternative is to ask for values in a range, e.g., "0.351357
>> <= value1 < 0.351358."
>> Keep in mind that the value 0.351357 and 0.351358 are not exactly
>> representable in binary values, and therefore whatever internal
>> representations might not be exactly what you are looking for.
>> Furthermore, because the printed values with 6-digit precision are
>> rounded, so you might have values that are printed as 0.351357 but are
>> not included in the query results.
>>
>> Hope this helps.
>>
>> John
>>
>>
>>
>>
>> On 2/26/16 3:12 PM, Justin Swanhart wrote:
>>> Hi,
>>>
>>> Double is just the double IEEE representation. Even though you see
>>> the exactly value, and you loaded that value, rounding error occurs in
>>> different places. From the looks of it, the input is probably
>>> rounding to single precision, but the general case for equality of a
>>> float is that it doesn't work, so it doesn't really make sense (to me)
>>> to change that as it would not fix things generally. Floats are
>>> imprecise and even when you search for the displayed value the
>>> internal value might not be the same.
>>>
>>> John can comment if he feels it is a bug. A workaround would be to
>>> use bc or gmp to store a fixed representation as a binary string, then
>>> search on that for equality (or just store the float as string). You
>>> will still need to store the "raw" float value as well for range
>>> searches using fastbit because fastbit does't understand how to use
>>> that fixed width data. Storing and using bc/gmp for fixed precision
>>> would be a nice extension to fastbit, but i personally don't use it
>>> enough (I just provide a MySQL interface to it) to make taking the
>>> time to make such a such worthwhile to me.
>>>
>>> --Justin
>>>
>>> --Justin
>>>
>>>
>>>
>>> On Fri, Feb 26, 2016 at 1:37 PM, Sean Davey
>>> <[email protected] <mailto:[email protected]>
>>> <mailto:[email protected]>> wrote:
>>>
>>> Hi Justin,
>>>
>>> thanks for the reply. I see what you’re saying but I thought it
>>> might work in this case because I’m searching for a value I know
>>> is in the data. The text I get back from "select min(value1)” is
>>> 0.35135729403. That’s the exact same text that is in the original
>>> file that was indexed. So shouldn’t the internal representation of
>>> that string be the same for indexing and for querying? If I’m
>>> querying for a value with the exact same text that was used when
>>> the data was indexed, shouldn’t it be found?
>>>
>>> btw, the value1 column is a double, not a float. I don’t know if
>>> that matters, but I thought of that when I noticed that the output
>>> from ibis includes the line "From csv Where value1 == 0.351357 -->
>>> 0” which for some reason doesn’t display all of 0.35135729403.
>>>
>>> cheers,
>>> Sean Davey
>>> Bio5 Institute, University of Arizona, Tucson
>>> [email protected]
>>> <mailto:[email protected]> <mailto:[email protected]>
>>>
>>>
>>>
>>>
>>>> On Feb 25, 2016, at 4:46 PM, Justin Swanhart
>>>> <[email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> It is not a bug. Equality on floats/doubles won't work because
>>>> they are IEEE float values and the displayed value may not be
>>>> the same (read is not usually the same) as the internally stored
>>>> value. This is a problem common to all databases that use IEEE
>>>> values to store floating point numbers. Many databases offer a
>>>> fixed point data type to work around this (in MySQL it is the
>>>> DECIMAL type) but FastBit doesn't have such a data type.
>>>>
>>>> --Justin
>>>>
>>>> On Wed, Feb 24, 2016 at 1:23 PM, Sean Davey
>>>> <[email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>>
>>>> wrote:
>>>>
>>>> hi all,
>>>>
>>>> I’m trying to find multiple min or max values in my index. I
>>>> do a query like “select min(value1)” which works fine and
>>>> returns a value such as 0.35135729403, which is correct.
>>>> However, when I try to find all the lines with that value
>>>> with a query like “select chr,start,stop,value1 where
>>>> value1=0.35135729403”, I get zero hits. In the output of the
>>>> second query I see the line "From csv Where value1 ==
>>>> 0.351357 --> 0” so it appears that the value I’m searching
>>>> for has been truncated.
>>>>
>>>> Please let me know if this is a bug and if so, if it can be
>>>> fixed.
>>>>
>>>> thanks,
>>>> Sean Davey
>>>> Bio5 Institute, University of Arizona, Tucson
>>>> [email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>
>>>>
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> <mailto:[email protected]>
>>>> <mailto:[email protected]>
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>
>>>>
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> <mailto:[email protected]> <mailto:[email protected]>
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> <mailto:[email protected]> <mailto:[email protected]>
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected] <mailto:[email protected]>
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
#include <stdio.h>
int main() {
double val1 = 0.35135729403;
printf("%.18G\n", val1);
return 0;
}
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users