Re: [R] Splicing factors without losing levels
Titus von der Malsburg wrote: On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote: For factors, you better convert them first back to character strings. splice - function(x, y) { x - levels(x)[x] y - levels(y)[y] factor(as.vector(rbind(x, y))) } Thank you very much, Thierry! I failed to mention something important in my last mail: x and y have the same levels. (I assume that the integer to level name mapping of a factor defines its class and that it only makes sense to combine factors of the same class.) Say x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d)) then x [1] b b d d Levels: a b c d as.integer(x) [1] 2 2 4 4 but splice(x,x) [1] b b b b d d d d Levels: b d as.integer(splice(x,x)) [1] 1 1 1 1 2 2 2 2 I'd like to have a splice function that retains the level to label mapping. One candidate for a solution is: splice - function(x,y) { xy - as.vector(rbind(x, y)) if (is.factor(x) is.factor(y)) xy - factor(xy, levels=1:length(levels(x)), labels=levels(x)) xy } However, this relies on assumtions about the implementation of factors that are neither mentioned nor guaranteed in the man page: Levels are underlyingly integers starting from one and going to length(levels). levels(x) gives me the labels of these integers in an order corresponding to 1:length(levels(x)). Without these assumptions I see no way to recover the integer to level name mapping for levels that are defined in a factor but do not occur. I'd be happy if somebody could clarify this issue! Hm, well,... Some people have been quite insistent that factors should be though of as isomorphic to vectors over small subsets of character strings and not as isomorphic to small integers with labels. I tend to disagree as it creates more complications than it solves. Anyways, I would do it like this (generalizing 8 and the seq() bits is left as an exercise) x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d)) xx - factor(rep(NA,8),levels=levels(x)) xx[seq(1,8,2)]-x xx[seq(2,8,2)]-x xx [1] b b b b d d d d Levels: a b c d as.integer(xx) [1] 2 2 2 2 4 4 4 4 Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splicing factors without losing levels
Hi list! An operation that I often need is splicing two vectors: splice(1:3, 4:6) [1] 1 4 2 5 3 6 For numeric vectors I use this hack: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 return(xy) } So far, so good (?). But I also need splicing for factors and I tried this: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 if (is.factor(x) is.factor(y)) { xy - as.factor(xy) levels(xy) - levels(x) } return(xy) } This, however, doesn't work because the level name to integer mapping gets mixed up when copying the levels from x to xy. My questions: 1.) How can this be fixed? 2.) What's the best way to do splicing of vectors and factors in R? (I couldn't find a prefdefined function for this although it seems to be such a basic and useful operation.) Thanks!! Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
Dear Titus, Your first function can be simplified to splice - function(x, y) { as.vector(rbind(x, y)) } For factors, you better convert them first back to character strings. splice - function(x, y) { x - levels(x)[x] y - levels(y)[y] factor(as.vector(rbind(x, y))) } HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Titus von der Malsburg Verzonden: dinsdag 9 juni 2009 11:12 Aan: r-help@r-project.org Onderwerp: [R] Splicing factors without losing levels Hi list! An operation that I often need is splicing two vectors: splice(1:3, 4:6) [1] 1 4 2 5 3 6 For numeric vectors I use this hack: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 return(xy) } So far, so good (?). But I also need splicing for factors and I tried this: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 if (is.factor(x) is.factor(y)) { xy - as.factor(xy) levels(xy) - levels(x) } return(xy) } This, however, doesn't work because the level name to integer mapping gets mixed up when copying the levels from x to xy. My questions: 1.) How can this be fixed? 2.) What's the best way to do splicing of vectors and factors in R? (I couldn't find a prefdefined function for this although it seems to be such a basic and useful operation.) Thanks!! Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
Titus von der Malsburg malsburg at gmail.com writes: An operation that I often need is splicing two vectors: splice(1:3, 4:6) [1] 1 4 2 5 3 6 For numeric vectors I use this hack: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 return(xy) } So far, so good (?). But I also need splicing for factors and I tried this: splice - function(x, y) { xy - cbind(x, y) xy - t(xy) dim(xy) - length(x) * 2 if (is.factor(x) is.factor(y)) { xy - as.factor(xy) levels(xy) - levels(x) } return(xy) } This, however, doesn't work because the level name to integer mapping gets mixed up when copying the levels from x to xy. Thanks!! Titus How about something like;: splice.factor - function(x, y){ if (!(is.factor(x) is.factor(y))) stop(Both x and y must be factors) if (length(x) != length(y)) stop(Both x and y must have same length) lx - levels(x) ly - levels(y) lxy - union(lx, ly) xy - cbind(levels(x)[x], levels(y)[y]) xy - t(xy) dim(xy) - NULL xy - factor(xy, levels = lxy) xy } splice.factor(factor(1:3), factor(4:6)) [1] 1 4 2 5 3 6 Levels: 1 2 3 4 5 6 -- Ken Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote: For factors, you better convert them first back to character strings. splice - function(x, y) { x - levels(x)[x] y - levels(y)[y] factor(as.vector(rbind(x, y))) } Thank you very much, Thierry! I failed to mention something important in my last mail: x and y have the same levels. (I assume that the integer to level name mapping of a factor defines its class and that it only makes sense to combine factors of the same class.) Say x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d)) then x [1] b b d d Levels: a b c d as.integer(x) [1] 2 2 4 4 but splice(x,x) [1] b b b b d d d d Levels: b d as.integer(splice(x,x)) [1] 1 1 1 1 2 2 2 2 I'd like to have a splice function that retains the level to label mapping. One candidate for a solution is: splice - function(x,y) { xy - as.vector(rbind(x, y)) if (is.factor(x) is.factor(y)) xy - factor(xy, levels=1:length(levels(x)), labels=levels(x)) xy } However, this relies on assumtions about the implementation of factors that are neither mentioned nor guaranteed in the man page: Levels are underlyingly integers starting from one and going to length(levels). levels(x) gives me the labels of these integers in an order corresponding to 1:length(levels(x)). Without these assumptions I see no way to recover the integer to level name mapping for levels that are defined in a factor but do not occur. I'd be happy if somebody could clarify this issue! Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
Various people have provided technical solutions to your problem. May I suggest, though, that 'splice' isn't quite the right word for this operation? Splicing two pieces of rope / movie film / audio tape / wires / etc. means connecting them at their ends, either at an extremity or in the middle, e.g. X: Y: Extremity splice: xx or yyxx Middle splice: xxxyyyx or yyyxxx The splice itself is the point of connection (xy or yx) between two things. In normal English, splicing never refers to interspersing alternate members of X and Y. This may seem like a minor point, but I think it is worthwhile using descriptive names for functions. -s On Tue, Jun 9, 2009 at 5:12 AM, Titus von der Malsburg malsb...@gmail.comwrote: An operation that I often need is splicing two vectors: splice(1:3, 4:6) [1] 1 4 2 5 3 6 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote: This may seem like a minor point, but I think it is worthwhile using descriptive names for functions. Makes sense. I thought I've seen this use somewhere else (probably in Lisp?). What better name do you suggest for this operation? Titus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
On Tue, Jun 9, 2009 at 11:16 AM, Titus von der Malsburg malsb...@gmail.comwrote: On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote: This may seem like a minor point, but I think it is worthwhile using descriptive names for functions. Makes sense. I thought I've seen this use somewhere else (probably in Lisp?). What better name do you suggest for this operation? The two meanings I can think of in Lisp for splicing are 1) The backquote operator ,@X, which means to insert the value of X as part of the surrounding list rather than as an element of the list, e.g. `(a b ,@'(c d) e f) == (append '(a b) '(c d) '(e f)) = (a b c d e f), as opposed to `(a b ,'(c d) e f) == (append '(a b) (list '(c d)) '(e f)) = (a b (c d) e f). 2) The notion of inserting (typically destructively) one list in the middle of another. I would suggest a name like 'intersperse' or 'alternate'. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.