As does Muenchen in RforSASSPSSusers.pdf and in the book that grew out of that effort:

http://rforsasandspssusers.googlepages.com/RforSASSPSSusers.pdf

http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/0387094172/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1217456813&sr=8-1

http://rforsasandspssusers.com/

Also see QuickR
http://www.statmethods.net/input/variablelables.html


On Oct 28, 2009, at 2:14 PM, Ista Zahn wrote:

Alzola and Harrell discuss some of these issues in "An introduction to
S and the Hmisc and Design Libraries".

-ista

On Wed, Oct 28, 2009 at 1:27 PM, Jacob Wegelin <jacobwege...@fastmail.fm > wrote:

Often it is useful to keep a "codebook" to document the contents of a
dataset. (By "dataset" I mean
a rectangular structure such as a dataframe.)

The codebook has as many rows as the dataset has columns (variables,
fields).  The columns (fields)
of the codebook may include:

       •       variable name

       •       type (character, factor, integer, etc)

• variable label (e.g., a variable called "bmi2" might be
labeled "BMI hand-input by
       clinic personnel, must be checked"

       •       permissible values

• which values indicate missing (and potentially different
kinds of missing)

Some statistics software (e.g., SPSS and Stata) provides at least a subset
of this kind of
information automatically in a convenient form. For instance, in Stata one
can define a "label" for
a variable and it is thenceforth linked to the variable. In output from
certain modeling and
graphics functions, Stata by default uses the label rather than the variable
name.

Furthemore: In Stata, if "myvariable" is labeled numeric (in R lingo, a
factor), and I type

codebook myvariable

then Stata tells me, among other things, the "levels" of myvariable.

Does a tool of this sort exist in R?

The prompt() function is related to this, but prompt(someDataFrame) creates
a text file on disk. The
text file is associated with, but not unambiguously linked to,
someDataFrame.

The epicalc function codebook() provides a summary of a dataframe similar to
that created by
summary() but easier to read. But this is not a way to define and keep track
of labels that are
linked to variables.

To link a dataframe to its codebook, one could do the following "by hand":
Create a list, say,
"somedata", where somedata$DATA is a dataframe that contains the data, and
somedata$VARIABLE is also
a dataframe, but serves as the codebook. For instance, the following
function creates a template
into which one could subsequently edit to insert variable labels and turn
into somedata$VARIABLE.

fnJunk <-function( THESEDATA ) {
#  From a dataframe, make the start of a codebook.
  if(!is.data.frame(THESEDATA)) stop("!is.data.frame(THESEDATA)")
  data.frame(
     Variable=names(THESEDATA)
     , class=sapply(THESEDATA, class)
     , type=sapply(THESEDATA, typeof)
     , label=""
     , comment=""
     )
}


But the following automatic behavior would be nice:

• We should be able to treat somedata exactly as we treat a
dataframe, so that the
fact that it possesses a "codebook" is merely an added benefit, not
an interference with the
       usual tasks.

• If we delete a column of somedata$DATA, the associated row of
somedata$VARIABLE
       should be automatically deleted.

• If we add a column to somedata$DATA, the associated column
should be inserted in
somedata$VARIABLE, and some of the fields automatically populated
such as variable name and
       type.  It could get fancier. For instance:

• If we try to add a value to a field in somedata$DATA which is
not permitted by the
"permissible values" listed for this field in somedata $VARIABLE, we
get an error.

Has anyone already thought this through, maybe defined a class and
associated methods?

Thanks

Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. E-mail: jwege...@vcu.edu URL: http://www.people.vcu.edu/~jwegelin
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to