Hi there, Try inserting a `by=id` in
a <- db[(has_url), getUrls(text, id), by=id] Also, no need for "has_url == T" instead, use (has_url) If the variable is alread logical. (Otherwise, you are just slowing things down ;) Ricardo Saporta Graduate Student, Data Analytics Rutgers University, New Jersey e: [email protected] On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <[email protected]> wrote: > I'm trying to run a function on every row fulfilling a certain criterium, > which returns a data frame - the idea is then to take the list of data > frames and rbindlist them together for a totally separate data.table. (I'm > extracting several URL links from each forum post, and tagging them with > the forum post they came from). > > I tried doing this with a data.table > > a <- db[has_url == T, getUrls(text, id)] > > and get the message > > Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, : > replacement has 11007 rows, data has 29787 > > Because some rows have several URLs... However, I don't care that these > rowlengths don't match, I still want these rows :) I thought J would just > let me execute arbitrary R code in the context of the rows as variable > names, etc. > > Here's the function it's running, but that shouldn't be relevant > > getUrls <- function(text, id) { > matches <- str_match_all(text, url_pattern) > a <- data.frame(urls=unlist(matches)) > a$id <- id > a > } > > > Thanks, and thanks for an amazing package - data.table has made my life so > much easier. It should be part of base, I think. > Stian Haklev, University of Toronto > > -- > http://reganmian.net/blog -- Random Stuff that Matters > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
