In fact, you should be able to skip the function altogether and just use:

   db[ (has_url), str_match_all(text, url_pattern), by=id]


(and now, my apologies to all for the email clutter)
good night

On Fri, Sep 27, 2013 at 2:41 AM, Ricardo Saporta <
[email protected]> wrote:

> sorry, I probably should have elaborated  (it's late here, in NJ)
>
> The error you are seeing is most likely coming from your getURL function
> in that you are adding several ids to a data.frame of varying rows, and `R`
> cannot recycle it correctly.
>
> If you instead breakdown by id, then each time you are only assigning one
> id and R will be able to recycle appropriately, without issue.
>
> good luck!
> Rick
>
>
> Ricardo Saporta
> Graduate Student, Data Analytics
> Rutgers University, New Jersey
> e: [email protected]
>
>
>
> On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta <
> [email protected]> wrote:
>
>> Hi there,
>>
>> Try inserting a `by=id` in
>>
>>    a <- db[(has_url), getUrls(text, id), by=id]
>>
>> Also, no need for "has_url == T"
>> instead, use
>>   (has_url)
>> If the variable is alread logical.  (Otherwise, you are just slowing
>> things down ;)
>>
>>
>>
>> Ricardo Saporta
>> Graduate Student, Data Analytics
>> Rutgers University, New Jersey
>> e: [email protected]
>>
>>
>>
>> On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev <[email protected]> wrote:
>>
>>> I'm trying to run a function on every row fulfilling a certain
>>> criterium, which returns a data frame - the idea is then to take the list
>>> of data frames and rbindlist them together for a totally separate
>>> data.table. (I'm extracting several URL links from each forum post, and
>>> tagging them with the forum post they came from).
>>>
>>> I tried doing this with a data.table
>>>
>>> a <- db[has_url == T, getUrls(text, id)]
>>>
>>> and get the message
>>>
>>> Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L, 1L, 2L, 4L,
>>>  :
>>>   replacement has 11007 rows, data has 29787
>>>
>>> Because some rows have several URLs... However, I don't care that these
>>> rowlengths don't match, I still want these rows :) I thought J would just
>>> let me execute arbitrary R code in the context of the rows as variable
>>> names, etc.
>>>
>>> Here's the function it's running, but that shouldn't be relevant
>>>
>>> getUrls <- function(text, id) {
>>>   matches <- str_match_all(text, url_pattern)
>>>   a <- data.frame(urls=unlist(matches))
>>>   a$id <- id
>>>   a
>>> }
>>>
>>>
>>> Thanks, and thanks for an amazing package - data.table has made my life
>>> so much easier. It should be part of base, I think.
>>> Stian Haklev, University of Toronto
>>>
>>> --
>>> http://reganmian.net/blog -- Random Stuff that Matters
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> [email protected]
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>
>>
>
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to