In fact, you should be able to skip the function altogether and just use: db[ (has_url), str_match_all(text, url_pattern), by=id]
(and now, my apologies to all for the email clutter) good night On Fri, Sep 27, 2013 at 2:41 AM, Ricardo Saporta < [email protected]> wrote: > sorry, I probably should have elaborated (it's late here, in NJ) > > The error you are seeing is most likely coming from your getURL function > in that you are adding several ids to a data.frame of varying rows, and `R` > cannot recycle it correctly. > > If you instead breakdown by id, then each time you are only assigning one > id and R will be able to recycle appropriately, without issue. > > good luck! > Rick > > > Ricardo Saporta > Graduate Student, Data Analytics > Rutgers University, New Jersey > e: [email protected] > > > > On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta < > [email protected]> wrote: > >> Hi there, >> >> Try inserting a `by=id` in >> >> a <- db[(has_url), getUrls(text, id), by=id] >> >> Also, no need for "has_url == T" >> instead, use >> (has_url) >> If the variable is alread logical. (Otherwise, you are just slowing >> things down ;) >> >> >> >> Ricardo Saporta >> Graduate Student, Data Analytics >> Rutgers University, New Jersey >> e: [email protected] >> >> >> >> On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <[email protected]> wrote: >> >>> I'm trying to run a function on every row fulfilling a certain >>> criterium, which returns a data frame - the idea is then to take the list >>> of data frames and rbindlist them together for a totally separate >>> data.table. (I'm extracting several URL links from each forum post, and >>> tagging them with the forum post they came from). >>> >>> I tried doing this with a data.table >>> >>> a <- db[has_url == T, getUrls(text, id)] >>> >>> and get the message >>> >>> Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, >>> : >>> replacement has 11007 rows, data has 29787 >>> >>> Because some rows have several URLs... However, I don't care that these >>> rowlengths don't match, I still want these rows :) I thought J would just >>> let me execute arbitrary R code in the context of the rows as variable >>> names, etc. >>> >>> Here's the function it's running, but that shouldn't be relevant >>> >>> getUrls <- function(text, id) { >>> matches <- str_match_all(text, url_pattern) >>> a <- data.frame(urls=unlist(matches)) >>> a$id <- id >>> a >>> } >>> >>> >>> Thanks, and thanks for an amazing package - data.table has made my life >>> so much easier. It should be part of base, I think. >>> Stian Haklev, University of Toronto >>> >>> -- >>> http://reganmian.net/blog -- Random Stuff that Matters >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
