Hi all,

I want to use data.table as either a data element (slot) of an S4 class or as a super class (ie. the new class inherits from data.table using "contains"). The reason for this is that I want to build an S4 class with data.table as the main data component of the class, but with some rather complex meta-data (specifically more S4 classes) associated. I then want to operate on this data.table (mostly with ":=") inside some functions

The second option (using a superclass of data.table) looks perfect, if it worked I would just be able to treat the new S4 class exactly as a data.table. One could pass a data.table superclass object into a function which could operate on the data.table superclass using ":=", and then (by the reference goodness of data.table) the data.table superclass would be still be modified outside the function. But when data.table is used as a super class, the normal operations just don't work. See my issue flagged on github (with a simple code snippet to demonstrate) here:

https://github.com/Rdatatable/data.table/issues/1504

Maybe this can work, which would be fantastic, but let's see.


Then there is the idea of using data.table a regular slot. Problem is that accessing the data.table slot in the S4 object and modifying it (either inside a function or using a class method) results in the type of copying that data.table works so hard to avoid! Disaster! For an example, run this code:

# simple test object
setClass("TestObj",
         slots = c(id = "character",
                   dt = "data.table"
         )
)

# define a method
setGeneric(name="testMethod",
           def=function(theObject,new.col.name, cols.to.add)
           {
             standardGeneric("testMethod")
           }
)
setMethod(f="testMethod",
          signature="TestObj",
          definition=function(theObject,new.col.name, cols.to.add)
          {
theObject@dt <- theObject@dt[,paste(new.col.name):= rowSums(.SD), .SDcols = cols.to.add]
            return(theObject)
          }
)

# create a TestObj
lala <- new("TestObj", id = "test", dt = data.table(a=1:10, b=11:20))

# accessing the data.table slot results in a copy :-(
lala@dt <- lala@dt[, c1 := a + b]

# using a method also makes a copy :'-(
testMethod(lala, new.col.name = "c2",  cols.to.add = c("a","b"))
lala <- testMethod(lala, new.col.name = "c2",  cols.to.add = c("a","b"))


So you can see the problem. I want to use a data.table as past of S4 class, and process it in keeping with data.table principles, but I can't find a way. It is possible that I can just suck up the performance cost of the copy, but some of my data.tables are pretty large so that might not be viable.

Any help greatly appreciated!

Thanks,

Matt




--
Dr Matthew Forrest
Biodiversity and Climate Research Centre (BiK-F)
Visiting address: Georg-Voigt-Straße 14-16, room 3.04, D-60325 Frankfurt am Main
Postal address: Senckenberganlage 25, D-60325 Frankfurt am Main
Tel.: +49-69-7542-1867
Fax: +49-69-7542-7904
E-mail: [email protected]
Homepage: http://www.bik-f.de/root/index.php?page_id=709

Senckenberg Gesellschaft für Naturforschung
Rechtsfähiger Verein gemäß § 22 BGB
Senckenberganlage 25
60325 Frankfurt

Direktorium: Prof. Dr. Dr. h.c. Volker Mosbrugger, Prof. Dr. Andreas Mulch, 
Stephanie Schwedhelm, Prof. Dr. Katrin Böhning-Gaese, Prof. Dr. Uwe Fritz,  PD 
Dr. Ingrid Kröncke
Präsidentin: Dr. h.c. Beate Heraeus
Aufsichtsbehörde: Magistrat der Stadt Frankfurt am Main (Ordnungsamt)

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to