This is because `x %between% y` works by calling `between(x, y[1], y[2])`, so your call becomes:
dt[date %between c(start, end)] ----> dt[between(date, c(start, end)[1], c(start, end)[2])] I don't know if there is anything that can be done about it (aside from not using the operator version with vectors). On Sun, Oct 6, 2013 at 5:29 PM, drclark <[email protected]> wrote: > Dear data.table experts, > > I was inspired by SO topic How to match two data.frames with an inexact > matching identifier (one identifier has to be in the range of the other) > for > a problem I have to calculate pollutant statistics during various episodes > from monitoring data. The episodes (like the fiscal quarters in the SO > topic) are defined for each site in a lookup table with starting and ending > dates. The start and end dates can be different at different sites. The SO > answer used >= and <= to check the date was in the range from start to end. > mD[qD][Month>=startMonth & Month<=endMonth] > > This approach may suit my problem, but I thought that I could use "between" > rather than the two logical comparisons. I tried both the between() > function and its equivalent %between% operator -- and I get two different > results. The between() version is correct, but %between% gives a wrong > answer. Am I missing something in the syntax for using between? > > My version of the SO data, merge and results below. I changed the variable > names to suit my work: ID->site, Month->date, MonValue->conc, > QTRValue->episodeID. > > require(data.table) # data.table 1.8.10 on R 3.0.2 under Win7x64 > # the measurement data > dat <- data.table(site = rep(c("A","B"), each=10), > date = rep(1:10, times = 2), # could be day or hour > conc = sample(30:50,2*10,replace=TRUE), # the pollutant > data > key="site,date") > dat > # site date conc > # 1: A 1 48 > # 2: A 2 44 > # 3: A 3 50 > # 4: A 4 47 > # 5: A 5 35 > # 6: A 6 47 > # 7: A 7 38 > # 8: A 8 34 > # 9: A 9 46 > #10: A 10 35 > #11: B 1 45 > #12: B 2 35 > #13: B 3 40 > #14: B 4 41 > #15: B 5 37 > #16: B 6 37 > #17: B 7 32 > #18: B 8 41 > #19: B 9 31 > #20: B 10 32 > # > # definitions for the episodes > episode <- data.table( > site = rep(c("A", "B"), each = 3), > start = c(1, 4, 7, 1, 3, 8), > end = c(3, 5, 10, 2, 5, 10), > episodeID = rep(1:3, 2), > key="site") > episode > # site start end episodeID > # 1: A 1 3 1 > # 2: A 4 5 2 > # 3: A 7 10 3 > # 4: B 1 2 1 > # 5: B 3 5 2 > # 6: B 8 10 3 > # > # join measurement data and episode list (for later aggregation using > mean() etc.) > # approach from the SO thread -- gives the right result > dat[episode, allow.cartesian=TRUE][date>=start & date<=end] > site date conc start end episodeID > # 1: A 1 48 1 3 1 > # 2: A 2 44 1 3 1 > # 3: A 3 50 1 3 1 > # 4: A 4 47 4 5 2 > # 5: A 5 35 4 5 2 > # 6: A 7 38 7 10 3 > # 7: A 8 34 7 10 3 > # 8: A 9 46 7 10 3 > # 9: A 10 35 7 10 3 > # 10: B 1 45 1 2 1 > # 11: B 2 35 1 2 1 > # 12: B 3 40 3 5 2 > # 13: B 4 41 3 5 2 > # 14: B 5 37 3 5 2 > # 15: B 8 41 8 10 3 > # 16: B 9 31 8 10 3 > # 17: B 10 32 8 10 3 > > # using between() -- also gives the desired result > dat[episode, allow.cartesian=TRUE][between (date,start,end)] > # (returns same result as above) > > # using %between% -- gives different result - not the right answer > dat[episode, allow.cartesian=TRUE][date %between% c(start,end)] > # site date conc start end episodeID > # 1: A 1 48 1 3 1 > # 2: A 1 48 4 5 2 > # 3: A 1 48 7 10 3 > # 4: B 1 45 1 2 1 > # 5: B 1 45 3 5 2 > # 6: B 1 45 8 10 3 > > So why does the %between% operator give a different result than between()? > There must be some detail of syntax I need to learn here. I also tried > putting the whole %between% expression in parenthesis, but that doesn't > make > any difference: > dat[episode, allow.cartesian=TRUE][(date %between% c(start,end))] > > Best regards. > Douglas Clark > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/between-versus-between-why-different-results-tp4677718.html > Sent from the datatable-help mailing list archive at Nabble.com. > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
