Agreed, but that seems easier said than done. How does a global R issue such as this, get done? I for one do not relish posting to r-devel.
"Joseph Voelkel" <[email protected]> wrote in message news:[email protected]... > This seems to be outside the scope of data.table. It is really a global R > issue, and one that should be addressed at that level (for example, > natural addition of these attributes to data frames (and of course data > tables :) ), with easy usage in functions such as plot. > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Bacou, > Melanie > Sent: Friday, July 29, 2011 12:16 AM > To: 'Griffith Rees'; [email protected] > Cc: [email protected] > Subject: Re: [datatable-help] Variable labels suggestion > > Griff, Matt, > > I agree that codebook support or more generally support for maintaining > meta-data is very poor in R. I also use Hmisc and end up maintaining my > codebook in separate files. Often times I need to carry over not just > variable labels, but also units, type, category, etc.. > > I'm forced to use inefficient and wordy procedures, the likes of: > > ## Add variable labels and units from codebook file (usually some dump > from STATA) > i <- 1 > for (x in names(df)) { > label(df[, x]) <- codebook [i, "varName"] > units(df[, x]) <- codebook [i, "varUnit"] > type(df[, x]) <- codebook [i, "varType"] > i <- i + 1 > } > > [...some variable recoding...] > > ## Save codebook to CSV > codebook <- data.frame(names(df), label(df), sapply(df, units), sapply(df, > type)) > names(codebook) <- c("varCode", "varName", "varUnit", "varType") > write.csv(codebook, file="codebook.csv") > > Any optimization for data.table that would facilitate read/write of > meta-data would make a lot sense. > > --Mel. > > > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Griffith Rees > Sent: Thursday, July 28, 2011 6:56 PM > To: [email protected] > Cc: [email protected] > Subject: Re: [datatable-help] Variable labels suggestion > > Indeed, making such labels useful is only is highly dependent on their > ability to be used with functions like toLatex. I think the first step > would be to provide a way of adding labels and then consider functions > that could help use them in formatting contexts, but kind of leave the > last mile up to users for the time being. If it catches on, people > will start to write wrappers that do the extra work. > > For example: the mtable function, which is what I primarily use to > format tables for latex, can be used with the relabel function (also > from the memisc package) to replace variable names in tables (see the > relabel example in: > http://www.oga-lab.net/RGM2/func.php?rd_id=memisc:mtable). A method > which returns those labels appropriately could be called directly when > mtable is used. It's not the prettiest solution, but it's a start. > > Obviously there's a mindshare aspect to this: the more people using > data.table and find variable labels useful, the more likely they are > to alter other packages to allow them to take advantage of those > labels. The way to accrue that advantage is to make it simple but > useful initially, and then wrappers can be added to make better use of > it. Obviously, the prior art in the Hmisc package failed to garner > enough mindshare for it to be used in other contexts, and data.table > succeeds here by retaining interoperability with everything else. > > I know the first thing I would probably do: write a wrapper around > read.dta which would read a stata file and return a data.table with > the stata labels. > > just an idea. Oh and an optimized data.table save format as well but > that's icing ;) > > -griff > > On Thu, Jul 28, 2011 at 8:11 PM, Matthew Dowle <[email protected]> > wrote: >> >> The toLatex aspect struck a chord. I sometimes embed the string 'PCT' >> into the column name and then gsub("PCT","\%") just before output to >> latex. Maybe a label would be more robust and could allow more complex >> latex expressions in the column heading. Long column names with spaces >> are ok, but that may make it cumbersome to follow the advice to use >> names not positions in j expressions. But how would the latex output >> command know to use the labels rather than the names? And would >> data.table need to know about column labels to carry them through >> subsets and joins etc? >> >> Matthew >> >> >> On Thu, 2011-07-28 at 13:51 -0400, Chris Neff wrote: >>> I think this is definitely out of the scope of data.table. >>> >>> On 28 July 2011 13:43, Tom Short <[email protected]> wrote: >>> On Thu, Jul 28, 2011 at 8:26 AM, Griffith Rees >>> <[email protected]> wrote: >>> > I think this page quite succinctly describes this issue: >>> > http://www.statmethods.net/input/variablelables.html >>> >>> >>> It would be easy to add to data.table. You could also add >>> support >>> outside of data.table by writing label.data.table and similar >>> functions. Actually using the labels for useful things is more >>> difficult. I often find it useful just to use more verbose >>> variable >>> names that include spaces as follows: >>> >>> > dt <- data.table(`My first column` = 1:3, `A character >>> column` = letters[1:3], check.names = FALSE) >>> > str(dt) >>> Classes 'data.table' and 'data.frame': 3 obs. of 2 >>> variables: >>> $ My first column : int 1 2 3 >>> $ A character column: Factor w/ 3 levels "a","b","c": 1 2 3 >>> >>> That way, columns look better with automatic plotting and with >>> lattice >>> or ggplot legends. >>> >>> - Tom >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> >> > > > > -- > Griffith Rees > Sociology DPhil Candidate > Oxford University > CABDyN Complexity Centre > http://www.cabdyn.ox.ac.uk > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
