Or, perhaps the tolerance should be a function of the range of the column. [The range would be quick to calculate with a single C for loop.]
On 30.04.2013 15:09, Matthew Dowle wrote: > Hi, > > data.table sorts double within machine tolerance : > >> sqrt(.Machine$double.eps) > [1] 1.490116e-08 >> > > i.e. numbers closer than this are considered equal. > > Otherwise we wouldn't be able to do things like DT[.(3.14)]. > > I had a quick look, see arguments of data.table:::ordernumtol which takes "tol" but there is no option provided (yet) to change this. Do we need one? > > In the examples section of one of the help pages it has an example which generates a series of numers very close together using pi. Note that your numbers are both close together, and, very close to 0. > > Matthew > > On 30.04.2013 14:52, Arunkumar Srinivasan wrote: > >> Hi there, >> I just saw something strange when I was sorting a column of p-values. I checked the data.table bug tracker for words "sort" and "floating point" and there were no hits for this case. There's a bug for "integer 64" sort on a column though. >> So, here's a reproducible example. I'd be glad to file a bug, if it is and be corrected if it's something I am doing wrong. >> >> set.seed(45) >> dt <- data.table(x=sample(50), y= sample(c(seq(0, 1, length.out=1000), 7000000:7000100), 50)/1e7) >> head(dt) >> x y >> 1: 32 5.395395e-08 >> 2: 16 6.956957e-08 >> 3: 12 2.142142e-08 >> 4: 18 5.855856e-08 >> 5: 17 6.216216e-08 >> 6: 14 5.025025e-08 >> setkey(dt, "y") # sort by column y >> head(dt, 10) >> x y >> 1: 47 1.401401e-09 >> 2: 12 2.142142e-08 >> 3: 24 1.391391e-08 >> 4: 43 9.809810e-09 <~~~ obviously false >> 5: 1 2.932933e-08 >> 6: 48 2.562563e-08 >> 7: 49 1.891892e-08 >> 8: 40 2.182182e-08 >> 9: 9 7.307307e-09 <~~~ obviously false >> 10: 45 2.482482e-08 >> >> Best, >> Arun
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
