Matthew, So what's the resolution here? Is it okay to sort in the "proper" order on the key column but use *machine tolerance* for subset on key column?
Arun On Tuesday, April 30, 2013 at 4:26 PM, Arunkumar Srinivasan wrote: > Matthew, > > Precisely. That's what I was thinking as well. But was hesitant to tell as I > dint know how complex it would be to implement / change it. Since the join > requires tolerance, sorting could be still done in the "right" order (by > disregarding tolerance during sort). > > Arun > > > On Tuesday, April 30, 2013 at 4:22 PM, Matthew Dowle wrote: > > > > > Maybe it doesn't actually need to sort within machine tolerance. If it > > was precise, the sort would be faster, that's for sure. But at the time, > > I remember thinking that it should preserve the order of rows within a > > group of values within machine tolerance (e.g. 3.99999999, 4.00000001, > > 3.99999999 should be consider 4.0 and order of those 3 rows maintained). > > But maybe sorting them to 3.99999999, 3.99999999, 4.00000001 is ok as it's > > just the join that should be within machine tolerance? > > Interested in how fast order(y) is, though. Compared to data.table sorting > > of doubles. > > Matthew > > > > On 30.04.2013 15:16, Arunkumar Srinivasan wrote: > > > Matthew, > > > I see. I din't think about tolerance. Although > > > dt[with(dt, order(y)), ] > > > seems to do the task right (similar to data.frame). I'm glad that I don't > > > have to convert to data.frame to perform the order. I am not keying by > > > this column. Unless one needs this column for keying, I don't think a > > > tolerance option is essential. Although, having it definitely would be > > > only nicer. > > > Arun > > > > > > > > > On Tuesday, April 30, 2013 at 4:09 PM, Matthew Dowle wrote: > > > > > > > > > > > Hi, > > > > data.table sorts double within machine tolerance : > > > > > sqrt(.Machine$double.eps) > > > > [1] 1.490116e-08 > > > > > > > > > > > > > i.e. numbers closer than this are considered equal. > > > > > > > > Otherwise we wouldn't be able to do things like DT[.(3.14)]. > > > > > > > > I had a quick look, see arguments of data.table:::ordernumtol which > > > > takes "tol" but there is no option provided (yet) to change this. Do we > > > > need one? > > > > > > > > In the examples section of one of the help pages it has an example > > > > which generates a series of numers very close together using pi. Note > > > > that your numbers are both close together, and, very close to 0. > > > > > > > > Matthew > > > > > > > > On 30.04.2013 14:52, Arunkumar Srinivasan wrote: > > > > > Hi there, > > > > > I just saw something strange when I was sorting a column of p-values. > > > > > I checked the data.table bug tracker for words "sort" and "floating > > > > > point" and there were no hits for this case. There's a bug for > > > > > "integer 64" sort on a column though. > > > > > So, here's a reproducible example. I'd be glad to file a bug, if it > > > > > is and be corrected if it's something I am doing wrong. > > > > > set.seed(45) > > > > > dt <- data.table(x=sample(50), y= sample(c(seq(0, 1, > > > > > length.out=1000), 7000000:7000100), 50)/1e7) > > > > > head(dt) > > > > > x y > > > > > 1: 32 5.395395e-08 > > > > > 2: 16 6.956957e-08 > > > > > 3: 12 2.142142e-08 > > > > > 4: 18 5.855856e-08 > > > > > 5: 17 6.216216e-08 > > > > > 6: 14 5.025025e-08 > > > > > setkey(dt, "y") # sort by column y > > > > > head(dt, 10) > > > > > x y > > > > > 1: 47 1.401401e-09 > > > > > 2: 12 2.142142e-08 > > > > > 3: 24 1.391391e-08 > > > > > 4: 43 9.809810e-09 <~~~ obviously false > > > > > 5: 1 2.932933e-08 > > > > > 6: 48 2.562563e-08 > > > > > 7: 49 1.891892e-08 > > > > > 8: 40 2.182182e-08 > > > > > 9: 9 7.307307e-09 <~~~ obviously false > > > > > 10: 45 2.482482e-08 > > > > > > > > > > Best, > > > > > Arun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
