Yes, seems like the columns themselves have names, with inconsistent length.

lapply(a,names)  should reveal the "hidden" names

To remove them :

for (i in 1:ncol(a)) setattr(a[[i]],"names",NULL)

Then lapply(a,names) should be clear.

Then try again the things that segfaulted before.

If this fixes it, we'll need to establish how the erroneous names got in there.


On 10/09/13 19:51, Chris Neff wrote:



On Tue, Sep 10, 2013 at 2:02 PM, Matthew Dowle <[email protected] <mailto:[email protected]>> wrote:


Nothing springs to mind. Latest version v1.8.10 from CRAN right? Or v1.8.11 on R-Forge?


Both. And 1.8.8.


    On this bit :

    > So somewhere these key columns think they are different lengths
    than they really are, and
    > when I try to access it I go into memory I shouldn't so I
    segfault.  How can I verify this? Is
    > there something about the DT I can check to see what DT thinks
    these columns are?

    .Internal(inspect(DT)) reveals the internal structure including
    length and truelength on the column pointer vector as well as each
    column.

    But it's a really odd way of using data.table. Iterating by row is
    going to kill performance; data.table likes by column.


Trust me I know this, this isn't my code :) I'm just the data.table guy who helps debug. I am helping him with better ways, but I think we can agree that it should at least not segfault.


I ran inspect on the two versions of the data.table, the one that crashes that is made by doing rbindlist(apply(d,1,...)) and the one that doesn't that gets made by doing rbindlist(lapply(1:nrow(d),...)), and changed the variable names and censored out values.

First the one that fails (accessing either a$k1 or a$k2 will segfault):

> .Internal(inspect(a))
@2cc5be0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
  @3b643d0 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  ATTRIB:
    @ac6c20 02 LISTSXP g1c0 [MARK]
      TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
      @3ba6ad8 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
        @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
        @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
  @3b64e30 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  ATTRIB:
    @ac6cc8 02 LISTSXP g1c0 [MARK]
      TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
      @3ba6a68 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
        @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
        @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
  @3b65890 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    ...
  @1ff5850 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
  @1fc6600 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
  ...
ATTRIB:
  @21f6d48 02 LISTSXP g0c0 []
    TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
    @3efc1f0 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
      @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
      @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
      @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
      @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
      @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
      ...
    TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
    @2556908 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
    TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
    @2701b38 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
      @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
    TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @21f6e28 22 EXTPTRSXP g0c0 []






Secondly the one that works (all values can be accessed fine:

> .Internal(inspect(a))
@45b4850 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
  @33a53a0 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  @33a5e00 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  @33a6860 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    ...
  @1ff10f0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
  @3a6d0d0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
  ...
ATTRIB:
  @276c360 02 LISTSXP g0c0 []
    TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
    @1fe5670 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
      @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
      @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
      @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
      @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
      @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
      ...
    TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
    @29cbf38 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
    TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
    @2d539a0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
      @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
    TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @276c440 22 EXTPTRSXP g0c0 []




It looks to me to be some differences in the ATTRs attached to k1 and k2 in the first case? I can't really parse this as well as you can.

    If it really has to be by row  then   DT[, fun(.SD,...),
    by=1:nrow(DT)]  should be better than apply().

    Matthew


    On 10/09/13 18:47, Chris Neff wrote:
    Narrowing it down further,

    a$x

    segfaults and

    a[,x]

    segfaults but

    a[,"x", with=FALSE]

    doesn't.


    On Tue, Sep 10, 2013 at 1:32 PM, Chris Neff <[email protected]
    <mailto:[email protected]>> wrote:

        I'm pretty sure it is some issue of a column that thinks it
        is bigger than it actually is.  I have tried, so far in vain,
        to make a reproducible example that I can share.  I have one,
        but can't share it.

        What happens is this:

        A data.frame is made:

        > d = data.frame(...)

        Then I call apply over every row, calling a different
        function that takes in a DT as well:

        l = apply(d, 1, function(x) func(x[1], x[2], DT))

        This returns a data.frame.  If I rbindlist this:

        a = rbindlist(l)

        I can print a just fine, and it will show me all data like
        normal. but if I try to just do

        a$x

        x is one of the columns that was a key in DT, then it
        segfaults.  If I ask for a column that was made by "func" and
        wasn't a column in DT, it works fine.  If I ask for only the
        first 10 rows and then ask for x:

        a[1:10]$x

        it works fine.

        So somewhere these key columns think they are different
        lengths than they really are, and when I try to access it I
        go into memory I shouldn't so I segfault.  How can I verify
        this? Is there something about the DT I can check to see what
        DT thinks these columns are?


        Also, if instead of apply when making the list, I do

        l = lapply(1:nrow(d), function(i) func(x[i,1],x[i,2],DT))

        and rbindlist that, it works fine too.




    _______________________________________________
    datatable-help mailing list
    [email protected]  
<mailto:[email protected]>
    https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to