hi, am Dienstag 08 Februar 2011 (17:18) schrieb Thomas Friedrichsmeier: > One corner-case is changing data from factor to character and back. > Currently levels are preserved, and I think that really is useful. So > labels would still be kept around, but probably labelled view (see below) > would only be available for factors.
the crucial point would be where these unused labels are being kept in the
meantime. does RKWard hold them in a seperate environment? as long as they're
not part of the data.frame itself that should be safe.
but i wouldn't force users to re-use those labels if they switch back to
factor. otherwise R and RKWard would give different results for cases like
this one, where a label "c" for value "3" was defined but actually unused:
> some.data <- data.frame(a=factor(c(1,2,4,5), levels=c(1:5),
labels=c("a","b","c","d","e")))
> some.data$a <- as.character(some.data$a)
> some.data$a <- as.factor(some.data$a)
R would give
> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "d" "e"
whereas, using the recycled label set, RKWard has to decide between either
> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "c" "d"
(i.e., make integers like R but label them as stored), or
> unclass(some.data$a)
[1] 1 2 4 5
attr(,"levels")
[1] "a" "b" "d" "e"
(i.e., make integers according to the labels as stored, thereby reconstruct
the original factor), both of which is a different outcome compared to R and
might lead to hardly tracable errors, like scripts that run correctly in
RKWard but not R.
the ability to revert to the original factor is of course a useful feature,
too. i think RKWard should behave like R by default, that is, like it forgot
about the previous labels, but somehow offer the option to re-use those labels
if you really want them, stressing that this might lead to different results
than expected. perhaps RKWard could even calculate and show the differences in
a way (like "which(data.new.labels != data.old.labels, arr.ind=TRUE)" or
something)...
> I wonder how many bugs can _possibly_ be left in the data editor, now?
how many lines of code does it have? ;-)
viele grüße :: m.eik
--
dipl. psych. meik michalke
institut f"ur experimentelle psychologie
abt. f"ur diagnostik und differentielle psychologie
heinrich-heine-universit"at 40225 d"usseldorf
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________ RKWard-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rkward-devel
