Re: [FastBit-users] possible bug in query

K. John Wu Fri, 26 Feb 2016 15:37:48 -0800

Hi, Sean and Justin,

The incoming floating-point values to FastBit are processed as
doubles, however, the string value of 0.35135729403 when converted to
double in memory, it will not match exactly with 0.35135729403. This
is because the internal representation of numbers are in binary and
decimal values like 0.1 does not have a compact binary representation.


The output line

From csv Where value1 == 0.351357 -->   0

is printed with standard C++ output function which prints out
floating-point values with 6 significant digits.  It is not an
indication of the internal representation.

Double precision values has 16 significant digits.

To ensure the machine representation is exactly what you specify, give
a number that is representable in binary, e.g. 1, 0.125, and 0.03125.

Another alternative is to ask for values in a range, e.g., "0.351357
<= value1 < 0.351358."
Keep in mind that the value 0.351357 and 0.351358 are not exactly
representable in binary values, and therefore whatever internal
representations might not be exactly what you are looking for.
Furthermore, because the printed values with 6-digit precision are
rounded, so you might have values that are printed as 0.351357 but are
not included in the query results.

Hope this helps.

John




On 2/26/16 3:12 PM, Justin Swanhart wrote:
> Hi,
> 
> Double is just the double IEEE representation.  Even though you see
> the exactly value, and you loaded that value, rounding error occurs in
> different places.  From the looks of it, the input is probably
> rounding to single precision, but the general case for equality of a
> float is that it doesn't work, so it doesn't really make sense (to me)
> to change that as it would not fix things generally.  Floats are
> imprecise and even when you search for the displayed value the
> internal value might not be the same.
> 
> John can comment if he feels it is a bug.  A workaround would be to
> use bc or gmp to store a fixed representation as a binary string, then
> search on that for equality (or just store the float as string).  You
> will still need to store the "raw" float value as well for range
> searches using fastbit because fastbit does't understand how to use
> that fixed width data.  Storing and using bc/gmp for fixed precision
> would be a nice extension to fastbit, but i personally don't use it
> enough (I just provide a MySQL interface to it) to make taking the
> time to make such a such worthwhile to me.
> 
> --Justin
> 
> --Justin
> 
> 
> 
> On Fri, Feb 26, 2016 at 1:37 PM, Sean Davey <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Justin,
> 
>     thanks for the reply. I see what you’re saying but I thought it
>     might work in this case because I’m searching for a value I know
>     is in the data. The text I get back from "select min(value1)” is
>     0.35135729403. That’s the exact same text that is in the original
>     file that was indexed. So shouldn’t the internal representation of
>     that string be the same for indexing and for querying? If I’m
>     querying for a value with the exact same text that was used when
>     the data was indexed, shouldn’t it be found?
> 
>     btw, the value1 column is a double, not a float. I don’t know if
>     that matters, but I thought of that when I noticed that the output
>     from ibis includes the line "From csv Where value1 == 0.351357 -->
>     0” which for some reason doesn’t display all of 0.35135729403.
> 
>     cheers,
>     Sean Davey
>     Bio5 Institute, University of Arizona, Tucson
>     [email protected] <mailto:[email protected]>
> 
> 
> 
> 
>>     On Feb 25, 2016, at 4:46 PM, Justin Swanhart
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>     Hi,
>>
>>     It is not a bug.  Equality on floats/doubles won't work because
>>     they are IEEE float values and the displayed value may not be
>>     the same (read is not usually the same) as the internally stored
>>     value.  This is a problem common to all databases that use IEEE
>>     values to store floating point numbers.  Many databases offer a
>>     fixed point data type to work around this (in MySQL it is the
>>     DECIMAL type) but FastBit doesn't have such a data type.  
>>
>>     --Justin
>>
>>     On Wed, Feb 24, 2016 at 1:23 PM, Sean Davey
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         hi all,
>>
>>         I’m trying to find multiple min or max values in my index. I
>>         do a query like “select min(value1)” which works fine and
>>         returns a value such as 0.35135729403, which is correct.
>>         However, when I try to find all the lines with that value
>>         with a query like “select chr,start,stop,value1 where
>>         value1=0.35135729403”, I get zero hits. In the output of the
>>         second query I see the line "From csv Where value1 ==
>>         0.351357 --> 0” so it appears that the value I’m searching
>>         for has been truncated.
>>
>>         Please let me know if this is a bug and if so, if it can be
>>         fixed.
>>
>>         thanks,
>>         Sean Davey
>>         Bio5 Institute, University of Arizona, Tucson
>>         [email protected] <mailto:[email protected]>
>>
>>         _______________________________________________
>>         FastBit-users mailing list
>>         [email protected]
>>         <mailto:[email protected]>
>>         https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>>
>>     _______________________________________________
>>     FastBit-users mailing list
>>     [email protected] <mailto:[email protected]>
>>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
>     _______________________________________________
>     FastBit-users mailing list
>     [email protected] <mailto:[email protected]>
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] possible bug in query

Reply via email to