Indeed, making such labels useful is only is highly dependent on their ability to be used with functions like toLatex. I think the first step would be to provide a way of adding labels and then consider functions that could help use them in formatting contexts, but kind of leave the last mile up to users for the time being. If it catches on, people will start to write wrappers that do the extra work.
For example: the mtable function, which is what I primarily use to format tables for latex, can be used with the relabel function (also from the memisc package) to replace variable names in tables (see the relabel example in: http://www.oga-lab.net/RGM2/func.php?rd_id=memisc:mtable). A method which returns those labels appropriately could be called directly when mtable is used. It's not the prettiest solution, but it's a start. Obviously there's a mindshare aspect to this: the more people using data.table and find variable labels useful, the more likely they are to alter other packages to allow them to take advantage of those labels. The way to accrue that advantage is to make it simple but useful initially, and then wrappers can be added to make better use of it. Obviously, the prior art in the Hmisc package failed to garner enough mindshare for it to be used in other contexts, and data.table succeeds here by retaining interoperability with everything else. I know the first thing I would probably do: write a wrapper around read.dta which would read a stata file and return a data.table with the stata labels. just an idea. Oh and an optimized data.table save format as well but that's icing ;) -griff On Thu, Jul 28, 2011 at 8:11 PM, Matthew Dowle <[email protected]> wrote: > > The toLatex aspect struck a chord. I sometimes embed the string 'PCT' > into the column name and then gsub("PCT","\%") just before output to > latex. Maybe a label would be more robust and could allow more complex > latex expressions in the column heading. Long column names with spaces > are ok, but that may make it cumbersome to follow the advice to use > names not positions in j expressions. But how would the latex output > command know to use the labels rather than the names? And would > data.table need to know about column labels to carry them through > subsets and joins etc? > > Matthew > > > On Thu, 2011-07-28 at 13:51 -0400, Chris Neff wrote: >> I think this is definitely out of the scope of data.table. >> >> On 28 July 2011 13:43, Tom Short <[email protected]> wrote: >> On Thu, Jul 28, 2011 at 8:26 AM, Griffith Rees >> <[email protected]> wrote: >> > I think this page quite succinctly describes this issue: >> > http://www.statmethods.net/input/variablelables.html >> >> >> It would be easy to add to data.table. You could also add >> support >> outside of data.table by writing label.data.table and similar >> functions. Actually using the labels for useful things is more >> difficult. I often find it useful just to use more verbose >> variable >> names that include spaces as follows: >> >> > dt <- data.table(`My first column` = 1:3, `A character >> column` = letters[1:3], check.names = FALSE) >> > str(dt) >> Classes 'data.table' and 'data.frame': 3 obs. of 2 >> variables: >> $ My first column : int 1 2 3 >> $ A character column: Factor w/ 3 levels "a","b","c": 1 2 3 >> >> That way, columns look better with automatic plotting and with >> lattice >> or ggplot legends. >> >> - Tom >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > -- Griffith Rees Sociology DPhil Candidate Oxford University CABDyN Complexity Centre http://www.cabdyn.ox.ac.uk _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
