Re: [FastBit-users] possible bug in query

Sean Davey Fri, 26 Feb 2016 15:48:47 -0800

Hi John and Justin,

thanks for your help on this. I understand that the double version of the 
string 0.35135729403 will mostly likely not be exactly 0.35135729403. What I 
was hoping is that whatever the double value turned out to be would be 
consistent between indexing and querying. I don’t need the query to compare the 
exact right double value, I just need it to match whatever the double value 
ended up being when the data was indexed, i.e. I need to retrieve the rows that 
had the original text values of 0.35135729403 with a query of “where 
value1=0.35135729403”.


thanks for the suggestions. indexing the column twice, once as a double and 
once as a string, seems like it might be the way to go.

Sean Davey
Bio5 Institute, University of Arizona, Tucson
[email protected]




> On Feb 26, 2016, at 4:37 PM, K. John Wu <[email protected]> wrote:
> 
> Hi, Sean and Justin,
> 
> The incoming floating-point values to FastBit are processed as
> doubles, however, the string value of 0.35135729403 when converted to
> double in memory, it will not match exactly with 0.35135729403. This
> is because the internal representation of numbers are in binary and
> decimal values like 0.1 does not have a compact binary representation.
> 
> The output line
> 
> From csv Where value1 == 0.351357 -->   0
> 
> is printed with standard C++ output function which prints out
> floating-point values with 6 significant digits.  It is not an
> indication of the internal representation.
> 
> Double precision values has 16 significant digits.
> 
> To ensure the machine representation is exactly what you specify, give
> a number that is representable in binary, e.g. 1, 0.125, and 0.03125.
> 
> Another alternative is to ask for values in a range, e.g., "0.351357
> <= value1 < 0.351358."
> Keep in mind that the value 0.351357 and 0.351358 are not exactly
> representable in binary values, and therefore whatever internal
> representations might not be exactly what you are looking for.
> Furthermore, because the printed values with 6-digit precision are
> rounded, so you might have values that are printed as 0.351357 but are
> not included in the query results.
> 
> Hope this helps.
> 
> John
> 
> 
> 
> 
> On 2/26/16 3:12 PM, Justin Swanhart wrote:
>> Hi,
>> 
>> Double is just the double IEEE representation.  Even though you see
>> the exactly value, and you loaded that value, rounding error occurs in
>> different places.  From the looks of it, the input is probably
>> rounding to single precision, but the general case for equality of a
>> float is that it doesn't work, so it doesn't really make sense (to me)
>> to change that as it would not fix things generally.  Floats are
>> imprecise and even when you search for the displayed value the
>> internal value might not be the same.
>> 
>> John can comment if he feels it is a bug.  A workaround would be to
>> use bc or gmp to store a fixed representation as a binary string, then
>> search on that for equality (or just store the float as string).  You
>> will still need to store the "raw" float value as well for range
>> searches using fastbit because fastbit does't understand how to use
>> that fixed width data.  Storing and using bc/gmp for fixed precision
>> would be a nice extension to fastbit, but i personally don't use it
>> enough (I just provide a MySQL interface to it) to make taking the
>> time to make such a such worthwhile to me.
>> 
>> --Justin
>> 
>> --Justin
>> 
>> 
>> 
>> On Fri, Feb 26, 2016 at 1:37 PM, Sean Davey <[email protected]
>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>> 
>>    Hi Justin,
>> 
>>    thanks for the reply. I see what you’re saying but I thought it
>>    might work in this case because I’m searching for a value I know
>>    is in the data. The text I get back from "select min(value1)” is
>>    0.35135729403. That’s the exact same text that is in the original
>>    file that was indexed. So shouldn’t the internal representation of
>>    that string be the same for indexing and for querying? If I’m
>>    querying for a value with the exact same text that was used when
>>    the data was indexed, shouldn’t it be found?
>> 
>>    btw, the value1 column is a double, not a float. I don’t know if
>>    that matters, but I thought of that when I noticed that the output
>>    from ibis includes the line "From csv Where value1 == 0.351357 -->
>>    0” which for some reason doesn’t display all of 0.35135729403.
>> 
>>    cheers,
>>    Sean Davey
>>    Bio5 Institute, University of Arizona, Tucson
>>    [email protected] <mailto:[email protected]> 
>> <mailto:[email protected] <mailto:[email protected]>>
>> 
>> 
>> 
>> 
>>>    On Feb 25, 2016, at 4:46 PM, Justin Swanhart
>>>    <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>> 
>>>    Hi,
>>> 
>>>    It is not a bug.  Equality on floats/doubles won't work because
>>>    they are IEEE float values and the displayed value may not be
>>>    the same (read is not usually the same) as the internally stored
>>>    value.  This is a problem common to all databases that use IEEE
>>>    values to store floating point numbers.  Many databases offer a
>>>    fixed point data type to work around this (in MySQL it is the
>>>    DECIMAL type) but FastBit doesn't have such a data type.  
>>> 
>>>    --Justin
>>> 
>>>    On Wed, Feb 24, 2016 at 1:23 PM, Sean Davey
>>>    <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>> 
>>>        hi all,
>>> 
>>>        I’m trying to find multiple min or max values in my index. I
>>>        do a query like “select min(value1)” which works fine and
>>>        returns a value such as 0.35135729403, which is correct.
>>>        However, when I try to find all the lines with that value
>>>        with a query like “select chr,start,stop,value1 where
>>>        value1=0.35135729403”, I get zero hits. In the output of the
>>>        second query I see the line "From csv Where value1 ==
>>>        0.351357 --> 0” so it appears that the value I’m searching
>>>        for has been truncated.
>>> 
>>>        Please let me know if this is a bug and if so, if it can be
>>>        fixed.
>>> 
>>>        thanks,
>>>        Sean Davey
>>>        Bio5 Institute, University of Arizona, Tucson
>>>        [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> 
>>>        _______________________________________________
>>>        FastBit-users mailing list
>>>        [email protected] <mailto:[email protected]>
>>>        <mailto:[email protected] 
>>> <mailto:[email protected]>>
>>>        https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users 
>>> <https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users>
>>> 
>>> 
>>>    _______________________________________________
>>>    FastBit-users mailing list
>>>    [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>>    https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users 
>>> <https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users>
>> 
>> 
>>    _______________________________________________
>>    FastBit-users mailing list
>>    [email protected] <mailto:[email protected]> 
>> <mailto:[email protected] <mailto:[email protected]>>
>>    https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> 
>> 
>> 
>> 
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] possible bug in query

Reply via email to