Maybe it doesn't actually need to sort within machine tolerance. If it was precise, the sort would be faster, that's for sure. But at the time, I remember thinking that it should preserve the order of rows within a group of values within machine tolerance (e.g. 3.99999999, 4.00000001, 3.99999999 should be consider 4.0 and order of those 3 rows maintained). But maybe sorting them to 3.99999999, 3.99999999, 4.00000001 is ok as it's just the join that should be within machine tolerance?
Interested in how fast order(y) is, though. Compared to data.table sorting of doubles. Matthew On 30.04.2013 15:16, Arunkumar Srinivasan wrote: > Matthew, > I see. I din't think about tolerance. Although > dt[with(dt, order(y)), ] > seems to do the task right (similar to data.frame). I'm glad that I don't have to convert to data.frame to perform the order. I am not keying by this column. Unless one needs this column for keying, I don't think a tolerance option is essential. Although, having it definitely would be only nicer. > > Arun > > On Tuesday, April 30, 2013 at 4:09 PM, Matthew Dowle wrote: > >> Hi, >> >> data.table sorts double within machine tolerance : >> >>> sqrt(.Machine$double.eps) >> [1] 1.490116e-08 >>> >> >> i.e. numbers closer than this are considered equal. >> >> Otherwise we wouldn't be able to do things like DT[.(3.14)]. >> >> I had a quick look, see arguments of data.table:::ordernumtol which takes "tol" but there is no option provided (yet) to change this. Do we need one? >> >> In the examples section of one of the help pages it has an example which generates a series of numers very close together using pi. Note that your numbers are both close together, and, very close to 0. >> >> Matthew >> >> On 30.04.2013 14:52, Arunkumar Srinivasan wrote: >> >>> Hi there, >>> I just saw something strange when I was sorting a column of p-values. I checked the data.table bug tracker for words "sort" and "floating point" and there were no hits for this case. There's a bug for "integer 64" sort on a column though. >>> So, here's a reproducible example. I'd be glad to file a bug, if it is and be corrected if it's something I am doing wrong. >>> >>> set.seed(45) >>> dt <- data.table(x=sample(50), y= sample(c(seq(0, 1, length.out=1000), 7000000:7000100), 50)/1e7) >>> head(dt) >>> x y >>> 1: 32 5.395395e-08 >>> 2: 16 6.956957e-08 >>> 3: 12 2.142142e-08 >>> 4: 18 5.855856e-08 >>> 5: 17 6.216216e-08 >>> 6: 14 5.025025e-08 >>> setkey(dt, "y") # sort by column y >>> head(dt, 10) >>> x y >>> 1: 47 1.401401e-09 >>> 2: 12 2.142142e-08 >>> 3: 24 1.391391e-08 >>> 4: 43 9.809810e-09 <~~~ obviously false >>> 5: 1 2.932933e-08 >>> 6: 48 2.562563e-08 >>> 7: 49 1.891892e-08 >>> 8: 40 2.182182e-08 >>> 9: 9 7.307307e-09 <~~~ obviously false >>> 10: 45 2.482482e-08 >>> >>> Best, >>> Arun
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
