Thanks for pointing that out. I didn't know about (= think to search for) that global option. I think I'll leave it as NA since, as you say, it's reasonably useful.
I forgot that people may want to switch to allow.cartesian = TRUE (although I never find myself wanting to use this) after seeing the error. So, a modified (very minor) FR: have the error message suggest switching to nomatch=0 (because this is what I personally find myself switching to after I see the error, though I don't know how common that choice is...). I still don't understand the mention of "duplicate key values in i" in the message, as the problem seems to be with duplicated values in x (at least in my example above). --Frank On Mon, Oct 14, 2013 at 12:42 AM, Michael Nelson < [email protected]> wrote: > > The default argument to nomatch is `'getOption("datatable.nomatch")`. The > default value for this is `NA`. > > If you want to change this option, simply set `options(datatable.nomatch > = 0)`, then the default will be as you want. > > I think the current datatable.nomatch = NA is reasonable, as you are > often interested in non-matches as well as matches. > > x[y, nomatch=NA] to give a error in your case, then follow the advice of > the error message and run > > x[y, nomatch=NA, allow.cartesian = TRUE] > > > > > > ------------------------------ > *From:* [email protected] [ > [email protected]] on behalf of Frank > Erickson [[email protected]] > *Sent:* Monday, 14 October 2013 1:03 PM > *To:* data.table source forge > *Subject:* [datatable-help] possible FR: in x[y], switch to nomatch=0 > instead of failing with "Error in vecseq..." > > I don't know if this error shows up in other cases, but I always see it > when I'm about to do > > x[y,b:=b] > > but first want to check how > > x[y] > > looks before creating or overwriting x$b. Here's an example: > > x <- data.table(a=rep(2:3,2),key='a') > y <- data.table(a=1:4,b=4:1,key='a') > > x[y] # error > x[y,nomatch=0] # ok > x[y,b:=b] # ok > > I'd prefer to see the first attempt mapped to the second (with a > suitable message), instead of erroring out. What do you all think? Is that > reasonable/worthwhile? > > Best, > > Frank > > P.S. One other point, regarding the message itself (reproduced down > below): I don't understand why repeated values in i are mentioned. > > -- For x[y] in my example, the problem seems to be coming from x having > repeated rows, not i (y in this case); > -- whereas y[x] works just fine (despite the repeated/duplicated values in > i...which is x here). > > Error in vecseq(f__, len__, if (allow.cartesian) NULL else > as.integer(max(nrow(x), : > Join results in 6 rows; more than 4 = max(nrow(x),nrow(i)). Check for > duplicate key values in i, each of which join to the same group in x over > and over again. If that's ok, try including `j` and dropping `by` > (by-without-by) so that j runs for each group to avoid the large > allocation. If you are sure you wish to proceed, rerun with > allow.cartesian=TRUE. Otherwise, please search for this error message in > the FAQ, Wiki, Stack Overflow and datatable-help for advice. > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
