Hi Arun, I wrote up a github issue here:
https://github.com/Rdatatable/data.table/issues/1642 Thanks, Frederick On Mon, Apr 11, 2016 at 12:42:08PM +0200, Arunkumar Srinivasan wrote: > Hi Frederik, the reason this was implemented is to avoid issues like this > (copied from ?setNumericRounding), which IIRC I pointed to you before: > DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a") > DT > setNumericRounding(0) # turn off rounding > DT[.(0.4)] # works > DT[.(0.6)] # no match, confusing since 0.6 is clearly there in DT > So while numeric rounding of ‘0’ solves your issue, it still persists on > other cases (like the one shown above). > Also you seem to be suggesting to use this *only* for order(). Why? Why not > ‘setorder()’ or ‘setkey()’? > FYI, speed is/was never really an issue and is just a (positive) side-effect. > > I see two options: > > 1. Identify, if possible, clearly and set the rounding appropriately so that > we run into this issue very rarely. i.e., ad-hoc numeric rounding. > 2. If it is not possible, then, rounding last two bytes really doesn’t solve > *most* issues w.r.t. rounding (which was its original purpose), as > opposed to without any rounding.. in which case, there’s no need for > setNumericRounding, so that we can attribute the inconsistencies > to floating point representation inaccuracies. > > Having had my share of experiences with floating point issues, my guess would > be the latter. Perhaps better to continue on the github project > page (if you could please file an issue there with a minimal example of > *your* problem). > > -- > Arun > > On 7 April 2016 at 22:14:08, [email protected] ([email protected]) wrote: > > Sorry, I forgot to Cc the list for this. > > Arunkumar, do you have an answer? You said: > > > If you’ve a better idea, please let us know and we would definitely be > > willing to implement that. > > and I said > > > My "better idea" at this point is, if speed is not an issue, then > > 'order' could use a numeric rounding of zero. > > (see below) > > Thank you, > > Frederick > > > > ----- Forwarded message from [email protected] ----- > > Date: Wed, 27 Jan 2016 15:52:25 -0800 > From: [email protected] > To: Arunkumar Srinivasan <[email protected]> > Subject: Re: [datatable-help] sorting on a floating point column > X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 > autolearn=ham autolearn_force=no > version=3.4.1 > X-Spam-Level: > User-Agent: Mutt/1.5.24 (2015-08-30) > X-My-Tags: inbox > > Thanks Arun for your reply. The '?order' page says: > > Columns of ‘numeric’ types (i.e., ‘double’) have their last two > bytes rounded off while computing order, by defalult, to avoid any > unexpected behaviour due to limitations in representing floating > point numbers precisely. Have a look at ‘setNumericRounding’ to > learn more. > > But I'm not sure what unexpected behavior this avoids. It seems like > it *causes* unexpected behavior (even if I'm the first to comment in > two years)... And '?setNumericRounding' says: > > Computers cannot represent some floating point numbers (such as > 0.6) precisely, using base 2. This leads to unexpected behaviour > when joining or grouping columns of type 'numeric'; > > So it sounds like the cases where you benefit from numeric rounding > are "joining or grouping", not in sorting. My "better idea" at this > point is, if speed is not an issue, then 'order' could use a numeric > rounding of zero. Alternatively, I would expand upon the '?order' > documentation to clarify that the reason for rounding is, for example, > speed - and not the elimination of "unexpected behavior". > > Thank you, > > Frederick > > On Thu, Jan 28, 2016 at 12:10:37AM +0100, Arunkumar Srinivasan wrote: > > Why do you want a minimal test case, when setNumericRounding explains > > that the behavior I reported is intentional? > > Because you refer to a post that’s quite a few years old, and data.table > > has moved along from ‘tolerance’ quite some time ago. And therefore it > > wasn’t clear to me what the exact issue is — whether you’re using an older > > version or a newer one, but you dint know that it wasn’t due to tolerance > > issue. > > > > I now see that this is also documented in the data.table::order page. > > So I guess it is already "documented visibly". > > Glad you got to read that. > > > > And setNumericRounding explains that it is slightly faster to ignore > > the last two bytes, requiring fewer radix sort passes. > > That’s not the reason for the function though, as it’s explained in > > `?setNumericRounding` with examples at the bottom of that page. > > > > I wanted to share my experience that this behavior is confusing. > > With floating point numbers, there’s always limitations. I find the > > examples under ?setNumericRounding confusing cases as well (which would > > return wrong results if we did not round). We try to reduce confusion by > > managing most obvious cases, or so we think. If you’ve a better idea, > > please let us know and we would definitely be willing to implement that. > > -- > > Arun > > > > On 28 January 2016 at 00:03:19, [email protected] ([email protected]) wrote: > > > > data.table 1.9.6 > > > > What's surprising is that sorting a list of floats wouldn't do the > > obvious thing, and sort them exactly. Is it surprising that this would > > be surprising? > > > > Why do you want a minimal test case, when setNumericRounding explains > > that the behavior I reported is intentional? > > > > I now see that this is also documented in the data.table::order page. > > So I guess it is already "documented visibly". > > > > And setNumericRounding explains that it is slightly faster to ignore > > the last two bytes, requiring fewer radix sort passes. > > > > I wanted to share my experience that this behavior is confusing. Thank > > you at least for pointing me to your documentation. > > > > Frederick > > > > On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote: > > > This is following up on a thread from a couple years ago: > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > > > > > Things have changed A LOT! I suggest you keep up-to-date by reading the > > > README about bug fixes and features from the github project page: > > > https://github.com/Rdatatable/data.table > > > > > > I ran into this problem myself, it took a bit of time to debug because it > > > is so surprising. > > > What’s surprising? Reproducible example please. data.table package > > > version, R version as well please. > > > Without that my best guess is for you to look at `?setNumericRounding`. > > > > > > -- > > > Arun > > > > > > On 27 January 2016 at 21:40:23, [email protected] ([email protected]) > > > wrote: > > > > > > This is following up on a thread from a couple years ago: > > > > > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html > > > > > > > > > I ran into this problem myself, it took a bit of time to debug because > > > it is so surprising. > > > > > > In my case, I was using order() to sort a list of floats. > > > > > > I expected the result to be monotonic but it wasn't! > > > > > > Then I found out that the problem was due to 'order' being part of the > > > data.table library. By using base::order, I was able to get correct > > > behavior. > > > > > > I don't understand why improperly ordering floating point data helps > > > the data.table library accomplish anything, whether it is looking up > > > keys or what. > > > > > > Also, it must be much slower to compare floats with a tolerance, than > > > to just compare them. I seem to recall that floats were designed so > > > that normal comparison is quite fast. > > > > > > Please fix this bug, or at least document it more visibly. > > > > > > Thank you, > > > > > > Frederick Eaton > > > _______________________________________________ > > > datatable-help mailing list > > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > > ----- End forwarded message ----- > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
