That was my thought too. I don't know what str_match_all is, but given the unlist() in getUrls(), it seems to return a list. Rather than unlist(), leave it as list, and data.table should happily make a `list` column where each cell is itself a vector. In fact each cell can be anything at all, even embedded data.table, function definitions, or any type of object.
You might need a list(list(str_match_all(...))) in j to do that.

Or what Rick has suggested here might work first time. It's hard to visualise it without a small reproducible example, so we're having to make educated guesses.

Many thanks for the kind words about data.table.

Matthew


On 27/09/13 07:44, Ricardo Saporta wrote:
In fact, you should be able to skip the function altogether and just use:

   db[ (has_url), str_match_all(text, url_pattern), by=id]


(and now, my apologies to all for the email clutter)
good night

On Fri, Sep 27, 2013 at 2:41 AM, Ricardo Saporta <[email protected] <mailto:[email protected]>> wrote:

    sorry, I probably should have elaborated  (it's late here, in NJ)

    The error you are seeing is most likely coming from your getURL
    function in that you are adding several ids to a data.frame of
    varying rows, and `R` cannot recycle it correctly.

    If you instead breakdown by id, then each time you are only
    assigning one id and R will be able to recycle appropriately,
    without issue.

    good luck!
    Rick


    Ricardo Saporta
    Graduate Student, Data Analytics
    Rutgers University, New Jersey
    e: [email protected] <mailto:[email protected]>



    On Fri, Sep 27, 2013 at 2:37 AM, Ricardo Saporta
    <[email protected]
    <mailto:[email protected]>> wrote:

        Hi there,

        Try inserting a `by=id` in

        a <- db[(has_url), getUrls(text, id), by=id]

        Also, no need for "has_url == T"
        instead, use
        (has_url)
        If the variable is alread logical.  (Otherwise, you are just
        slowing things down ;)



        Ricardo Saporta
        Graduate Student, Data Analytics
        Rutgers University, New Jersey
        e: [email protected] <mailto:[email protected]>



        On Thu, Sep 26, 2013 at 11:16 PM, Stian Håklev
        <[email protected] <mailto:[email protected]>> wrote:

            I'm trying to run a function on every row fulfilling a
            certain criterium, which returns a data frame - the idea
            is then to take the list of data frames and rbindlist them
            together for a totally separate data.table. (I'm
            extracting several URL links from each forum post, and
            tagging them with the forum post they came from).

            I tried doing this with a data.table

            a <- db[has_url == T, getUrls(text, id)]

            and get the message

            Error in `$<-.data.frame`(`*tmp*`, "id", value = c(1L, 6L,
            1L, 2L, 4L,  :
              replacement has 11007 rows, data has 29787

            Because some rows have several URLs... However, I don't
            care that these rowlengths don't match, I still want these
            rows :) I thought J would just let me execute arbitrary R
            code in the context of the rows as variable names, etc.

            Here's the function it's running, but that shouldn't be
            relevant

            getUrls <- function(text, id) {
              matches <- str_match_all(text, url_pattern)
              a <- data.frame(urls=unlist(matches))
              a$id <- id
              a
            }


            Thanks, and thanks for an amazing package - data.table has
            made my life so much easier. It should be part of base, I
            think.
            Stian Haklev, University of Toronto

-- http://reganmian.net/blog -- Random Stuff that Matters

            _______________________________________________
            datatable-help mailing list
            [email protected]
            <mailto:[email protected]>
            
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help






_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to