Hi,
Basically adding columns by reference to a data.table when it's a member of a list of data.table, is really difficult to handle internally. I had to special case internally to get around list() copying, so that the binding can change inside the list on the shallow copy when [[ is used. A for loop is the way to add columns by reference inside a list of data.table, and that should work ok using [[. But doing that via lapply and mapply is really stretching it. Even catching user expectations in this area is difficult. Ideally we'd catch mapply, yes, but really data.table likes to be rbindlist()-ed and then ops to work on a single large data.table. We can advice to the warning message not to use mapply or lapply to add columns by reference to a list of data.table (use a for loop instead) ?
Matthew


On 22/09/13 03:02, Ricardo Saporta wrote:
Matthew,

I did notice the warning, but something doesnt add up:

If the issue is simply that it is being copied when created, then wouldnt we expect the same warning to arise when we try to modify the table in using `mapply` or `lapply`? (the latter does not produce a warning.

If on the otherhand, the issue pertains specifically to mapply (which I assume it does), then why is it only a problem when we iterate over the list directly, whereas iterating indirectly by using an index does not produce any warnings. While overall, this is minor if one is aware of the issue, I think it might allow for unnoticed bugs to creep into someones code. Specifically if using mapply to modify a list of DTs and the user not realizing that the modifications are not being held.

That being said, I'm not sure how this could even be addressed if the root is in mapply, but is it worth trying to address?

Rick


On Fri, Sep 20, 2013 at 2:18 PM, Matthew Dowle <[email protected] <mailto:[email protected]>> wrote:

    Does this sentence from the warning help?


    " Also, in R<v3.1.0, list(DT1,DT2) copied the entire DT1 and DT2
    (R's list() used to copy named objects); please upgrade to
    R>=v3.1.0 if that is biting. "

    Matthew


    On 20/09/13 19:01, Ricardo Saporta wrote:
    One warning per DT in the list
      (I added the line breaks)
    -Rick
    =============================================
    Warning messages:

    1: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :

      Invalid .internal.selfref detected and fixed by taking a copy
    of the whole table so that := can add this new column by
    reference. At an earlier point, this data.table has been copied
    by R (or been created manually using structure() or similar).
    Avoid key<-, names<- and attr<- which in R currently (and oddly)
    may copy the whole data.table. Use set* syntax instead to avoid
    copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
    list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to
    copy named objects); please upgrade to R>=v3.1.0 if that is
    biting. If this message doesn't help, please report to
    datatable-help so the root cause can be fixed.

    2: In `[.data.table`(DT, , `:=`(c("Col3", "Col4"), list(C3, C4))) :

      Invalid .internal.selfref detected and fixed by taking a copy
    of the whole table so that := can add this new column by
    reference. At an earlier point, this data.table has been copied
    by R (or been created manually using structure() or similar).
    Avoid key<-, names<- and attr<- which in R currently (and oddly)
    may copy the whole data.table. Use set* syntax instead to avoid
    copying: ?set, ?setnames and ?setattr. Also, in R<v3.1.0,
    list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to
    copy named objects); please upgrade to R>=v3.1.0 if that is
    biting. If this message doesn't help, please report to
    datatable-help so the root cause can be fixed.
    =============================================




    On Fri, Sep 20, 2013 at 12:49 PM, Matthew Dowle
    <[email protected] <mailto:[email protected]>> wrote:


        Hi,

        What's the warning?

        Matthew



        On 20/09/13 14:48, Ricardo Saporta wrote:
        I've encountered the following issue iterating over a list
        of data.tables.
        The issue is only with mapply, not with lapply .

        Given a list of data.table's, mapply'ing over the list directly
        cannot modify in place.

        Also if attempting to add a new column, we get an "Invalid
        .internal.selfref" warning.
        Modifying an existing column does not issue a warning, but
        still fails to modify-in-place

        WORKAROUND:
        ----------
        The workaround is to iterate over an index to the list, then to
          modify each data.table via list.of.DTs[[i]][ .. ]

        **Interestingly, this issue occurs with `mapply`, but not
        `lapply`.**

        EXAMPLE:
        --------
          # Given a list of DT's and two lists of vectors,
          #   we want to add the corresponding vectors as columns to
        the DT.

        ## ---------------- ##
        ##   SAMPLE DATA:   ##
        ## ---------------- ##
          # list of data.tables
          list.DT <- list(
        DT1=data.table(Col1=111:115, Col2=121:125),
        DT2=data.table(Col1=211:215, Col2=221:225)
            )

          # lists of columns to add
          list.Col3 <- list(131:135, 231:235)
          list.Col4 <- list(141:145, 241:245)


        ## ------------------------------------ ##
        ##   Iterating over the list elements   ##
        ##     adding a new column              ##
        ## ------------------------------------ ##
        ##   Will issue warning and             ##
        ##     will fail to modify in place     ##
        ## ------------------------------------ ##
          mapply (
              function(DT, C3, C4)
                 DT[, c("Col3", "Col4") := list(C3, C4)],
              list.DT,  # iterating over the list
              list.Col3, list.Col4,
              SIMPLIFY=FALSE
            )

          ## Note the lack of change
          list.DT


        ## ------------------------------------ ##
        ##   Iterating over an index            ##
        ## ------------------------------------ ##
          mapply (
              function(i, C3, C4)
                 list.DT[[i]] [, c("Col3", "Col4") := list(C3, C4)],
              seq(list.DT),   # iterating over an index to the list
              list.Col3, list.Col4,
              SIMPLIFY=FALSE
            )

          ## Note each DT _has_ been modified
          list.DT

        ## ------------------------------------ ##
        ##   Iterating over the list elements   ##
        ##     modifying existing column        ##
        ## ------------------------------------ ##
        ##   No warning issued, but             ##
        ##     Will fail to modify in place     ##
        ## ------------------------------------ ##
          mapply (
              function(DT, C3, C4)
                 DT[, c("Col3", "Col4") := list(Col3*1e3, Col4*1e4)],

              list.DT,  # iterating over the list
              list.Col3, list.Col4,
              SIMPLIFY=FALSE
            )

          ## Note the lack of change (compare with output from `mapply`)
          list.DT

        ## ------------------------------------ ##
        ##                ##
        ##   `lapply` works as expected.        ##
        ##                ##
        ## ------------------------------------ ##
          ## NOW WITH lapply
          lapply(list.DT,
            function(DT)
              DT[, newCol := LETTERS[1:5]]
          )

          ## Note the new column:
          list.DT



        # ========================== #

        ##   NON-WORKAROUNDS ##
        ##
        ## I also tried all of the following alternatives
        ##   in hopes of being able to iterate over the list
        ##   directly, using `mapply`.
        ## None of these worked.

        # (1) Creating the DTs First, then creating the list from them
            DT1 <- data.table(Col1=111:115, Col2=121:125)
            DT2 <- data.table(Col1=211:215, Col2=221:225)

            list.DT <- list(DT1=DT1,DT2=DT2 )


        # (2) Same as 1, and using `copy()` in the call to `list()`
            list.DT <- list(DT1=copy(DT1),
        DT2=copy(DT2) )

        # (3) lapply'ing `copy` and then iterating over that list
            list.DT <- lapply(list.DT, copy)

        # (4) Not naming the list elements
            list.DT <- list(DT1, DT2)
            # and tried
            list.DT <- list(copy(DT1), copy(DT2))

        ## All of the above still failed to modify in place
        ##   (and also issued the same warning if trying to add a
        column)
        ##    when iterating using mapply

          mapply(function(DT, C3, C4)
            DT[, c("Col3", "Col4") := list(C3, C4)],
            list.DT, list.Col3, list.Col4,
            SIMPLIFY=FALSE)


        # ========================== #


        Ricardo Saporta
        Rutgers University, New Jersey
        e: [email protected] <mailto:[email protected]>



        _______________________________________________
        datatable-help mailing list
        [email protected]  
<mailto:[email protected]>
        
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help






_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to