On 20 July 2011 10:42, Matthew Dowle <[email protected]> wrote: > > Thanks, makes sense. Yes, as.data.frame.data.table currently removes the > 'sorted' attribute, which is all a key is. I suppose that line could be > removed so the key would be left on the data.frame. You would then need > to change the class back to data.table at the end of the function, though, > and make sure you didn't change the order of the rows otherwise that key > would be invalid. > > However, packages I use, use other packages I don't use directly and know > nothing about. I don't see the issue. Disk space? Memory space? The > banner? >
Behaving nicely in a build environment that is more complicated than a normal R thing. > There is also this related FR : > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=984&group_id=240&atid=978 > > Just to check you know that the result of j in data.table can happily be a > data.frame? So if your user is using data.table to call your function, he > won't mind. If he's passing the entire data.table to your function, then > he's not going to be wanting to retain the key anyway. You're returning > some statistical result to him (not the orginal data back) so why does the > key make sense to retain? > > Well I explicitly crafted an example where I return the entire data frame. Now, in this dumb example I ruined the ordering so the key leaves anyway. But I think i have cases where I want to take an entire data.(table|frame), do some processing, and return the full data.(table|frame) back like it was. Noticing that strictly, your MyFunc 'returned' two columns, so it might be > written like this : > > MyFunc <- function(numerator,denominator) > { > o = order(numerator) > data.frame(numerator[o], cumsum((numerator/denomintor)[o]) > } > > Then the user can decide if he wants to cbind it to his data.frame, or > fast assign it into a data.table, or by group, or whatever. That seems > to me to be up to your user. Perhaps, the job of MyFunc is to return it's > output given the input (and that's all). > I think my issues are coming more from inexperience/uneasiness with some of the data.table idioms still. When you list it all out like that it becomes crystal clear though, and I think refactoring of my code is correct. I'm just not in the data.table mindset yet I guess. > Matthew > > > > Mainly it is that I am writing some library functions that I and a few > > others may be using. I don't want those functions to have to depend on > > data.table because I don't want it to need to be installed for a purpose > > that has nothing to do with it. But I use data.tables as input. Here is a > > psuedo example > > > > MyFunc <- function(data, numerator.var, denominator.var) > > { > > data <- data[order(data[,numerator.var])] > > data$metric <- data[, numerator.var] / data[, denominator.var] > > data$cum.metric <- cumsum(data$metric) > > > > return(data) > > } > > > > I make this example to show that I need to preserve the whole data > > variable > > the whole way through and return a modified version. If I do > > > > data <- as.data.frame(data) > > > > as the first line of that function, then I lose the keys in a potential > > data.table that is passed in. If I use > > > > data <- as.data.table(data) > > > > and change the subsetting to be data.table compliant, then I am forcing > > someone to have a whole package loaded for something that can be done in > > the > > base language fine. There must be an agnostic way to do this. Apparently > > subset doesn't do it either if keys get lost. > > > > -Chris > > > > On 20 July 2011 08:48, Matthew Dowle <[email protected]> wrote: > > > >> > >> Hi Chris, > >> > >> If you're writing a package and don't want to worry if someone passes > >> your > >> package a data.table, then don't worry; just use data.frame syntax and > >> your non-datatable-aware package will work fine. > >> > >> If you're writing your own code you're in control of, just embrace the > >> data.table ;) > >> > >> If you're writing a function in an environment which is data.table > >> aware, > >> but you want your function to accept either data.frame or data.table, > >> then > >> at the beginning of your function just do : > >> > >> f = myfunction(x) { > >> x = as.data.table(x) > >> # proceed with data.table syntax > >> } > >> > >> or > >> > >> f = myfunction(x) { > >> x = as.data.frame(x) > >> # proceed with data.frame syntax > >> } > >> > >> Some of the CRAN packages that depend on data.table are doing that, I > >> think. > >> > >> In R itself it is common practice to coerce arguments to a common type > >> and > >> then proceed with the appropriate syntax for that type. Consider that > >> matrix syntax is different syntax to data.frame syntax. You often see > >> as.classiwant() at the beginning of functions, or switches depending on > >> the type of object. > >> > >> Remember that is.data.frame() is TRUE for both data.frame and > >> data.table, > >> but is.data.table() is TRUE only for data.table. as.data.table() does > >> nothing if x is already a data.table, and is an efficient class change > >> if > >> x is a data.frame. Is efficiency the issue? > >> > >> Does that help? If not, more info about the problem will be needed > >> please. > >> > >> Matthew > >> > >> > >> > I'm used to seeing the column names at the bottom of the column too, > >> but > >> > that is only if the data.table is long enough. My example was too > >> short > >> > for > >> > that, so I made the same sort of mistake you did :( > >> > > >> > Okay, that is a way, but is it a good way? Not sure... > >> > > >> > 2011/7/20 Timothée Carayol <[email protected]> > >> > > >> >> Sorry my mistake -- subset does return a data.table. > >> >> (I was using as an example a data.table with 100 rows, and stupidly > >> >> using > >> >> the fact that it printed the whole thing rather than the 10 first > >> rows > >> >> only > >> >> as my criterion for whether it worked or not.. Omitting that > >> >> print.data.table does print up to 100 rows. I feel a bit stupid.) > >> >> > >> >> Why doesn't it work for you if that is the case? > >> >> > >> >> DF <- data.frame(a=1:200, b=1:10) > >> >> DT <- as.data.table(DF) > >> >> subDT <- subset(DT, select=a) > >> >> class(DT) > >> >> subDF <- subset(DF, select=a) > >> >> class(DF) > >> >> identical(as.data.frame(DT), DF) > >> >> > >> >> > >> >> > >> >> On Wed, Jul 20, 2011 at 12:50 PM, Chris Neff <[email protected]> > >> wrote: > >> >> > >> >>> Yeah I realized that myself. > >> >>> > >> >>> Another one: the function "with" doesn't seem to do what I want... > >> but > >> >>> at > >> >>> least it is consistent! > >> >>> > >> >>> > >> >>> 2011/7/20 Timothée Carayol <[email protected]> > >> >>> > >> >>>> Sorry -- > >> >>>> > >> >>>> subset() was a poor idea, as it will return a data.frame even if > >> the > >> >>>> argument is a data.table.. > >> >>>> > >> >>>> > >> >>>> > >> >>>> 2011/7/20 Timothée Carayol <[email protected]> > >> >>>> > >> >>>>> Hi-- > >> >>>>> > >> >>>>> You can use the subset() command with the select= option; not sure > >> >>>>> it's > >> >>>>> the best solution, though. > >> >>>>> > >> >>>>> Timothee > >> >>>>> > >> >>>>> > >> >>>>> On Wed, Jul 20, 2011 at 12:26 PM, Chris Neff <[email protected]> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> I have a function where I pass a data frame and some variable > >> names > >> >>>>>> to > >> >>>>>> calculate statistics on. However, I am at a loss as to how to > >> write > >> >>>>>> it > >> >>>>>> correctly so that both data.frame and data.table work with it. If > >> I > >> >>>>>> have: > >> >>>>>> > >> >>>>>> DF = data.frame(x=1:10,y=2:11,z=3:12) > >> >>>>>> > >> >>>>>> DT = data.table(DF) > >> >>>>>> > >> >>>>>> var.names = c("x","y") > >> >>>>>> > >> >>>>>> > >> >>>>>> I can do the following things to subset: > >> >>>>>> > >> >>>>>> DT[,var.names,with=FALSE] > >> >>>>>> DF[,var.names] > >> >>>>>> > >> >>>>>> > >> >>>>>> but of course DT[,var.names] won't give me back what I want, and > >> >>>>>> DF[,var.names,with=FALSE] returns an error because with doesn't > >> >>>>>> exist there. > >> >>>>>> So how do I do this? > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> -Chris > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> _______________________________________________ > >> >>>>>> datatable-help mailing list > >> >>>>>> [email protected] > >> >>>>>> > >> >>>>>> > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > _______________________________________________ > >> > datatable-help mailing list > >> > [email protected] > >> > > >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > >> > > >> > >> > >> > > > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
