Hi Arun,

I wrote up a github issue here:

https://github.com/Rdatatable/data.table/issues/1642

Thanks,

Frederick

On Mon, Apr 11, 2016 at 12:42:08PM +0200, Arunkumar Srinivasan wrote:
> Hi Frederik, the reason this was implemented is to avoid issues like this 
> (copied from ?setNumericRounding), which IIRC I pointed to you before:
> DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a")
> DT
> setNumericRounding(0)   # turn off rounding
> DT[.(0.4)]   # works
> DT[.(0.6)]   # no match, confusing since 0.6 is clearly there in DT
> So while numeric rounding of ‘0’ solves your issue, it still persists on 
> other cases (like the one shown above). 
> Also you seem to be suggesting to use this *only* for order(). Why? Why not 
> ‘setorder()’ or ‘setkey()’?
> FYI, speed is/was never really an issue and is just a (positive) side-effect.
> 
> I see two options:
> 
> 1. Identify, if possible, clearly and set the rounding appropriately so that 
> we run into this issue very rarely. i.e., ad-hoc numeric rounding.
> 2. If it is not possible, then, rounding last two bytes really doesn’t solve 
> *most* issues w.r.t. rounding (which was its original purpose), as 
> opposed to without any rounding.. in which case, there’s no need for 
> setNumericRounding, so that we can attribute the inconsistencies 
> to floating point representation inaccuracies.
> 
> Having had my share of experiences with floating point issues, my guess would 
> be the latter. Perhaps better to continue on the github project 
> page (if you could please file an issue there with a minimal example of 
> *your* problem).
> 
> -- 
> Arun
> 
> On 7 April 2016 at 22:14:08, [email protected] ([email protected]) wrote:
> 
> Sorry, I forgot to Cc the list for this.
> 
> Arunkumar, do you have an answer? You said:
> 
> > If you’ve a better idea, please let us know and we would definitely be
> > willing to implement that.
> 
> and I said
> 
> > My "better idea" at this point is, if speed is not an issue, then
> > 'order' could use a numeric rounding of zero.
> 
> (see below)
> 
> Thank you,
> 
> Frederick
> 
> 
> 
> ----- Forwarded message from [email protected] -----
> 
> Date: Wed, 27 Jan 2016 15:52:25 -0800
> From: [email protected]
> To: Arunkumar Srinivasan <[email protected]>
> Subject: Re: [datatable-help] sorting on a floating point column
> X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 
> autolearn=ham autolearn_force=no
> version=3.4.1
> X-Spam-Level:  
> User-Agent: Mutt/1.5.24 (2015-08-30)
> X-My-Tags: inbox
> 
> Thanks Arun for your reply. The '?order' page says:
> 
> Columns of ‘numeric’ types (i.e., ‘double’) have their last two
> bytes rounded off while computing order, by defalult, to avoid any
> unexpected behaviour due to limitations in representing floating
> point numbers precisely. Have a look at ‘setNumericRounding’ to
> learn more.
> 
> But I'm not sure what unexpected behavior this avoids. It seems like
> it *causes* unexpected behavior (even if I'm the first to comment in
> two years)... And '?setNumericRounding' says:
> 
> Computers cannot represent some floating point numbers (such as
> 0.6) precisely, using base 2. This leads to unexpected behaviour
> when joining or grouping columns of type 'numeric';
> 
> So it sounds like the cases where you benefit from numeric rounding
> are "joining or grouping", not in sorting. My "better idea" at this
> point is, if speed is not an issue, then 'order' could use a numeric
> rounding of zero. Alternatively, I would expand upon the '?order'
> documentation to clarify that the reason for rounding is, for example,
> speed - and not the elimination of "unexpected behavior".
> 
> Thank you,
> 
> Frederick
> 
> On Thu, Jan 28, 2016 at 12:10:37AM +0100, Arunkumar Srinivasan wrote:
> > Why do you want a minimal test case, when setNumericRounding explains 
> > that the behavior I reported is intentional? 
> > Because you refer to a post that’s quite a few years old, and data.table 
> > has moved along from ‘tolerance’ quite some time ago. And therefore it 
> > wasn’t clear to me what the exact issue is — whether you’re using an older 
> > version or a newer one, but you dint know that it wasn’t due to tolerance 
> > issue.
> >  
> > I now see that this is also documented in the data.table::order page. 
> > So I guess it is already "documented visibly". 
> > Glad you got to read that.
> >  
> > And setNumericRounding explains that it is slightly faster to ignore 
> > the last two bytes, requiring fewer radix sort passes. 
> > That’s not the reason for the function though, as it’s explained in 
> > `?setNumericRounding` with examples at the bottom of that page. 
> >  
> > I wanted to share my experience that this behavior is confusing.
> > With floating point numbers, there’s always limitations. I find the 
> > examples under ?setNumericRounding confusing cases as well (which would 
> > return wrong results if we did not round). We try to reduce confusion by 
> > managing most obvious cases, or so we think. If you’ve a better idea, 
> > please let us know and we would definitely be willing to implement that.
> > -- 
> > Arun
> >  
> > On 28 January 2016 at 00:03:19, [email protected] ([email protected]) wrote:
> >  
> > data.table 1.9.6  
> >  
> > What's surprising is that sorting a list of floats wouldn't do the  
> > obvious thing, and sort them exactly. Is it surprising that this would  
> > be surprising?  
> >  
> > Why do you want a minimal test case, when setNumericRounding explains  
> > that the behavior I reported is intentional?  
> >  
> > I now see that this is also documented in the data.table::order page.  
> > So I guess it is already "documented visibly".  
> >  
> > And setNumericRounding explains that it is slightly faster to ignore  
> > the last two bytes, requiring fewer radix sort passes.  
> >  
> > I wanted to share my experience that this behavior is confusing. Thank  
> > you at least for pointing me to your documentation.  
> >  
> > Frederick  
> >  
> > On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote:  
> > > This is following up on a thread from a couple years ago:   
> > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html
> > >    
> > > Things have changed A LOT! I suggest you keep up-to-date by reading the 
> > > README about bug fixes and features from the github project page: 
> > > https://github.com/Rdatatable/data.table  
> > >  
> > > I ran into this problem myself, it took a bit of time to debug because it 
> > > is so surprising.   
> > > What’s surprising? Reproducible example please. data.table package 
> > > version, R version as well please.   
> > > Without that my best guess is for you to look at `?setNumericRounding`.  
> > >  
> > > --   
> > > Arun  
> > >  
> > > On 27 January 2016 at 21:40:23, [email protected] ([email protected]) 
> > > wrote:  
> > >  
> > > This is following up on a thread from a couple years ago:  
> > >  
> > > http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html
> > >   
> > >  
> > > I ran into this problem myself, it took a bit of time to debug because  
> > > it is so surprising.  
> > >  
> > > In my case, I was using order() to sort a list of floats.  
> > >  
> > > I expected the result to be monotonic but it wasn't!  
> > >  
> > > Then I found out that the problem was due to 'order' being part of the  
> > > data.table library. By using base::order, I was able to get correct  
> > > behavior.  
> > >  
> > > I don't understand why improperly ordering floating point data helps  
> > > the data.table library accomplish anything, whether it is looking up  
> > > keys or what.  
> > >  
> > > Also, it must be much slower to compare floats with a tolerance, than  
> > > to just compare them. I seem to recall that floats were designed so  
> > > that normal comparison is quite fast.  
> > >  
> > > Please fix this bug, or at least document it more visibly.  
> > >  
> > > Thank you,  
> > >  
> > > Frederick Eaton  
> > > _______________________________________________  
> > > datatable-help mailing list  
> > > [email protected]  
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >   
> 
> 
> ----- End forwarded message -----
> _______________________________________________
> datatable-help mailing list
> [email protected]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to