sorry, I probably should have elaborated (it's late here, in NJ) The error you are seeing is most likely coming from your getURL function in that you are adding several ids to a data.frame of varying rows, and `R` cannot recycle it correctly.
If you instead breakdown by id, then each time you are only assigning one id and R will be able to recycle appropriately, without issue. good luck! Rick Ricardo Saporta Graduate Student, Data Analytics Rutgers University, New Jersey e: [email protected] On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta < [email protected]> wrote: > Hi there, > > Try inserting a `by=id` in > > a <- db[(has_url), getUrls(text, id), by=id] > > Also, no need for "has_url == T" > instead, use > (has_url) > If the variable is alread logical. (Otherwise, you are just slowing > things down ;) > > > > Ricardo Saporta > Graduate Student, Data Analytics > Rutgers University, New Jersey > e: [email protected] > > > > On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <[email protected]> wrote: > >> I'm trying to run a function on every row fulfilling a certain criterium, >> which returns a data frame - the idea is then to take the list of data >> frames and rbindlist them together for a totally separate data.table. (I'm >> extracting several URL links from each forum post, and tagging them with >> the forum post they came from). >> >> I tried doing this with a data.table >> >> a <- db[has_url == T, getUrls(text, id)] >> >> and get the message >> >> Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L, : >> replacement has 11007 rows, data has 29787 >> >> Because some rows have several URLs... However, I don't care that these >> rowlengths don't match, I still want these rows :) I thought J would just >> let me execute arbitrary R code in the context of the rows as variable >> names, etc. >> >> Here's the function it's running, but that shouldn't be relevant >> >> getUrls <- function(text, id) { >> matches <- str_match_all(text, url_pattern) >> a <- data.frame(urls=unlist(matches)) >> a$id <- id >> a >> } >> >> >> Thanks, and thanks for an amazing package - data.table has made my life >> so much easier. It should be part of base, I think. >> Stian Haklev, University of Toronto >> >> -- >> http://reganmian.net/blog -- Random Stuff that Matters >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
