Hi,
I will look into the GUI-scripting more extensively tomorrow. Until
then, I have some comments on 'missing values'.
Missing values are a major problem in statistical analysis. Because this
topic has such great importance, I will discuss it in greater detail.
There are multiple ways to represent missing values. Researchers will
often use various representations for such missing values.
Unfortunately, spreadsheet applications do NOT have such a data-type,
hampering any standardised representation.
The most often used representations for missing values are: 'an empty
cell' (or, sometimes, erroneously, a space), 'NA', '-', '--', '0'. These
are by far not all possible combinations.
Therefore, any serious statistical analysis starts with marking such
values as "missing values". Fortunately, R comes with some handsome
functions which allow automatic conversion of these values.
1.
R-packages
2.
Calc-issues
1. R-PACKAGES
=============
Especially useful is package 'gdata'. The latest R-newsletter (Volume
7/1, April 2007, see
http://cran.r-project.org/doc/Rnews/Rnews_2007-1.pdf) describes this
package (Working with Unknown Values, p 24).
Especially useful are the following functions, which I will shortly
describe next.
*
function isUnknown( data_vector, missing_values_vector )
*
function unknownToNA( data_vector, missing_values_vector )
Suppose, we have the following vector:
data<- c(0,32,24,35,36,42,37,45,55,39,49,NA,"-")
Obviously, the values "0", "NA" and "-" are probably missing values. To
get rid of them, we may use the functions 'isUnknown' and 'unknownToNA'
from package 'gdata':
isUnknown(x = data, unknown = c(0, NA, "-") ) returns "TRUE" for the 3
missing values.
data.corrected <- unknownToNA(x = data, unknown = c(0, NA, "-") ), sets
the missing values to NA (NOTE: it is not necessary to explicitly
specify 'NA').
[Because the 'data'-vector contained strings, we may wish to convert
data.corrected to a numeric vector using 'data.numeric <-
as.numeric(data.corrected)'.]
> data.numeric
[1] NA 32 24 35 36 42 37 45 55 39 49 NA NA
The previous functions are most useful with lists and data-frames.
2. CALC ISSUES
=============
A possible problem are empty Calc cells. The Calc-R-parsing routine may
interpret missing values as zeroes. However, a value of '0' might be
still a valid value, so we may be forced to avoid replacing all the '0's
with 'NA' using the function unknownToNA().
Therefore, empty cells should be handled by the parsing routine and
replaced by 'NA' when pasting the data into R.
There are 2 possible problems with this automatic parsing (not directly
related to the parsing itself):
1.
empty cells at the end of a column (trailing empty cells): might
be just empty cells AND NOT values at all (not even missing values)
2.
some statistical methods need equal length for the data groups
1.
trailing empty cells
-- we may wish to import a range spanning more than one column
into R (e.g. a data frame); yet, the various columns may have
different length, and therefore we may have various amounts of
empty cells at the end of each column. Such empty cells might be
best dropped from the data set and NOT considered 'NA'-values. The
data- importing mechanism should therefore have an option: 'ignore
trailing empty cells'.
2.
equal data-group length
-- sometimes we need equal length for two (or more) data vectors.
Therefore, we may want an option to append missing values to the
end of the various vectors up to the length of the largest vector.
Sincerely,
Leonard
Wojciech Gryc wrote:
Hi everyone,
This is just a general update about the Calc/R integration project.
Over the
weekend I implemented a pretty neat feature (though I'm biased) which
allows
people to create dialog boxes in Calc through an external file rather
than
having it coded. The reason I did this is because I created a rudimentary
system where a person can actually script their own user interface
that will
ask for specific inputs and provide custom outputs from R within a
spreadsheet.
I won't go into details here, but check this post for more information:
http://www.11-55.org/ooblog/?p=21
You can also see a sample script file here:
http://www.11-55.org/ooblog/wp-content/uploads/2007/05/correl-may-21.txt
Any comments would be appreciated.
Thanks,
Wojciech