Hi John and Justin, thanks for your help on this. I understand that the double version of the string 0.35135729403 will mostly likely not be exactly 0.35135729403. What I was hoping is that whatever the double value turned out to be would be consistent between indexing and querying. I don’t need the query to compare the exact right double value, I just need it to match whatever the double value ended up being when the data was indexed, i.e. I need to retrieve the rows that had the original text values of 0.35135729403 with a query of “where value1=0.35135729403”.
thanks for the suggestions. indexing the column twice, once as a double and once as a string, seems like it might be the way to go. Sean Davey Bio5 Institute, University of Arizona, Tucson [email protected] > On Feb 26, 2016, at 4:37 PM, K. John Wu <[email protected]> wrote: > > Hi, Sean and Justin, > > The incoming floating-point values to FastBit are processed as > doubles, however, the string value of 0.35135729403 when converted to > double in memory, it will not match exactly with 0.35135729403. This > is because the internal representation of numbers are in binary and > decimal values like 0.1 does not have a compact binary representation. > > The output line > > From csv Where value1 == 0.351357 --> 0 > > is printed with standard C++ output function which prints out > floating-point values with 6 significant digits. It is not an > indication of the internal representation. > > Double precision values has 16 significant digits. > > To ensure the machine representation is exactly what you specify, give > a number that is representable in binary, e.g. 1, 0.125, and 0.03125. > > Another alternative is to ask for values in a range, e.g., "0.351357 > <= value1 < 0.351358." > Keep in mind that the value 0.351357 and 0.351358 are not exactly > representable in binary values, and therefore whatever internal > representations might not be exactly what you are looking for. > Furthermore, because the printed values with 6-digit precision are > rounded, so you might have values that are printed as 0.351357 but are > not included in the query results. > > Hope this helps. > > John > > > > > On 2/26/16 3:12 PM, Justin Swanhart wrote: >> Hi, >> >> Double is just the double IEEE representation. Even though you see >> the exactly value, and you loaded that value, rounding error occurs in >> different places. From the looks of it, the input is probably >> rounding to single precision, but the general case for equality of a >> float is that it doesn't work, so it doesn't really make sense (to me) >> to change that as it would not fix things generally. Floats are >> imprecise and even when you search for the displayed value the >> internal value might not be the same. >> >> John can comment if he feels it is a bug. A workaround would be to >> use bc or gmp to store a fixed representation as a binary string, then >> search on that for equality (or just store the float as string). You >> will still need to store the "raw" float value as well for range >> searches using fastbit because fastbit does't understand how to use >> that fixed width data. Storing and using bc/gmp for fixed precision >> would be a nice extension to fastbit, but i personally don't use it >> enough (I just provide a MySQL interface to it) to make taking the >> time to make such a such worthwhile to me. >> >> --Justin >> >> --Justin >> >> >> >> On Fri, Feb 26, 2016 at 1:37 PM, Sean Davey <[email protected] >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >> >> Hi Justin, >> >> thanks for the reply. I see what you’re saying but I thought it >> might work in this case because I’m searching for a value I know >> is in the data. The text I get back from "select min(value1)” is >> 0.35135729403. That’s the exact same text that is in the original >> file that was indexed. So shouldn’t the internal representation of >> that string be the same for indexing and for querying? If I’m >> querying for a value with the exact same text that was used when >> the data was indexed, shouldn’t it be found? >> >> btw, the value1 column is a double, not a float. I don’t know if >> that matters, but I thought of that when I noticed that the output >> from ibis includes the line "From csv Where value1 == 0.351357 --> >> 0” which for some reason doesn’t display all of 0.35135729403. >> >> cheers, >> Sean Davey >> Bio5 Institute, University of Arizona, Tucson >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> >> >> >> >>> On Feb 25, 2016, at 4:46 PM, Justin Swanhart >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>> >>> Hi, >>> >>> It is not a bug. Equality on floats/doubles won't work because >>> they are IEEE float values and the displayed value may not be >>> the same (read is not usually the same) as the internally stored >>> value. This is a problem common to all databases that use IEEE >>> values to store floating point numbers. Many databases offer a >>> fixed point data type to work around this (in MySQL it is the >>> DECIMAL type) but FastBit doesn't have such a data type. >>> >>> --Justin >>> >>> On Wed, Feb 24, 2016 at 1:23 PM, Sean Davey >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>> >>> hi all, >>> >>> I’m trying to find multiple min or max values in my index. I >>> do a query like “select min(value1)” which works fine and >>> returns a value such as 0.35135729403, which is correct. >>> However, when I try to find all the lines with that value >>> with a query like “select chr,start,stop,value1 where >>> value1=0.35135729403”, I get zero hits. In the output of the >>> second query I see the line "From csv Where value1 == >>> 0.351357 --> 0” so it appears that the value I’m searching >>> for has been truncated. >>> >>> Please let me know if this is a bug and if so, if it can be >>> fixed. >>> >>> thanks, >>> Sean Davey >>> Bio5 Institute, University of Arizona, Tucson >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>> >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> <https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users> >>> >>> >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> <https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users> >> >> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> >> >> >> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
