Re: [R] Splicing factors without losing levels

2009-06-11 Thread Peter Dalgaard

Titus von der Malsburg wrote:

On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:

For factors, you better convert them first back to character strings.

  splice - function(x, y) {
x - levels(x)[x]
y - levels(y)[y]
factor(as.vector(rbind(x, y)))
  } 


Thank you very much, Thierry!

I failed to mention something important in my last mail: x and y have
the same levels.  (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)

Say

 x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d))

then

 x
[1] b b d d
Levels: a b c d

 as.integer(x)
[1] 2 2 4 4

but

 splice(x,x)
[1] b b b b d d d d
Levels: b d

 as.integer(splice(x,x))
[1] 1 1 1 1 2 2 2 2

I'd like to have a splice function that retains the level to label
mapping.  One candidate for a solution is:

splice - function(x,y) {
  xy - as.vector(rbind(x, y))
  if (is.factor(x)  is.factor(y))
xy - factor(xy, levels=1:length(levels(x)), labels=levels(x))
  xy
}

However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels).  levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).

Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.

I'd be happy if somebody could clarify this issue!


Hm, well,... Some people have been quite insistent that factors should 
be though of as isomorphic to vectors over small subsets of character 
strings and not as isomorphic to small integers with labels. I tend  to 
disagree as it creates more complications than it solves.


Anyways, I would do it like this (generalizing 8 and the seq() bits is 
left as an exercise)


 x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d))
 xx - factor(rep(NA,8),levels=levels(x))
 xx[seq(1,8,2)]-x
 xx[seq(2,8,2)]-x
 xx
[1] b b b b d d d d
Levels: a b c d
 as.integer(xx)
[1] 2 2 2 2 4 4 4 4




  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg

Hi list!

An operation that I often need is splicing two vectors:

   splice(1:3, 4:6)
  [1] 1 4 2 5 3 6

For numeric vectors I use this hack:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
return(xy)
  }

So far, so good (?).  But I also need splicing for factors and I tried
this:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
if (is.factor(x)  is.factor(y)) {
  xy - as.factor(xy)
  levels(xy) - levels(x)
}
return(xy)
  }

This, however, doesn't work because the level name to integer mapping
gets mixed up when copying the levels from x to xy.

My questions:

 1.) How can this be fixed?
 2.) What's the best way to do splicing of vectors and factors in R?
 (I couldn't find a prefdefined function for this although it seems to be
 such a basic and useful operation.)

Thanks!!

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread ONKELINX, Thierry
Dear Titus,

Your first function can be simplified to

  splice - function(x, y) {
as.vector(rbind(x, y))
  } 

For factors, you better convert them first back to character strings.

  splice - function(x, y) {
x - levels(x)[x]
y - levels(y)[y]
factor(as.vector(rbind(x, y)))
  } 

HTH,

Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Namens Titus von der Malsburg
Verzonden: dinsdag 9 juni 2009 11:12
Aan: r-help@r-project.org
Onderwerp: [R] Splicing factors without losing levels


Hi list!

An operation that I often need is splicing two vectors:

   splice(1:3, 4:6)
  [1] 1 4 2 5 3 6

For numeric vectors I use this hack:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
return(xy)
  }

So far, so good (?).  But I also need splicing for factors and I tried
this:

  splice - function(x, y) {
xy - cbind(x, y)
xy - t(xy)
dim(xy) - length(x) * 2
if (is.factor(x)  is.factor(y)) {
  xy - as.factor(xy)
  levels(xy) - levels(x)
}
return(xy)
  }

This, however, doesn't work because the level name to integer mapping
gets mixed up when copying the levels from x to xy.

My questions:

 1.) How can this be fixed?
 2.) What's the best way to do splicing of vectors and factors in R?
 (I couldn't find a prefdefined function for this although it seems to
be  such a basic and useful operation.)

Thanks!!

 Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Ken Knoblauch
Titus von der Malsburg malsburg at gmail.com writes:

 An operation that I often need is splicing two vectors:
splice(1:3, 4:6)
   [1] 1 4 2 5 3 6
 For numeric vectors I use this hack:
   splice - function(x, y) {
 xy - cbind(x, y)
 xy - t(xy)
 dim(xy) - length(x) * 2
 return(xy)
   }
 So far, so good (?).  But I also need splicing for factors and I tried
 this:
 
   splice - function(x, y) {
 xy - cbind(x, y)
 xy - t(xy)
 dim(xy) - length(x) * 2
 if (is.factor(x)  is.factor(y)) {
   xy - as.factor(xy)
   levels(xy) - levels(x)
 }
 return(xy)
   }
 This, however, doesn't work because the level name to integer mapping
 gets mixed up when copying the levels from x to xy.
 Thanks!!
  Titus

How about something like;:

splice.factor - function(x, y){
if (!(is.factor(x)  is.factor(y)))
stop(Both x and y must be factors)
if (length(x) != length(y)) 
stop(Both x and y must have same length)
lx - levels(x)
ly - levels(y)
lxy - union(lx, ly)
xy - cbind(levels(x)[x], levels(y)[y])
xy - t(xy)
dim(xy) - NULL
xy - factor(xy, levels = lxy)
xy
}

 splice.factor(factor(1:3), factor(4:6))
[1] 1 4 2 5 3 6
Levels: 1 2 3 4 5 6

-- 
Ken Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg
On Tue, Jun 09, 2009 at 11:23:36AM +0200, ONKELINX, Thierry wrote:
 For factors, you better convert them first back to character strings.
 
   splice - function(x, y) {
   x - levels(x)[x]
   y - levels(y)[y]
   factor(as.vector(rbind(x, y)))
   } 

Thank you very much, Thierry!

I failed to mention something important in my last mail: x and y have
the same levels.  (I assume that the integer to level name mapping of
a factor defines its class and that it only makes sense to combine
factors of the same class.)

Say

 x - factor(c(2,2,4,4), levels=1:4, labels=c(a,b,c,d))

then

 x
[1] b b d d
Levels: a b c d

 as.integer(x)
[1] 2 2 4 4

but

 splice(x,x)
[1] b b b b d d d d
Levels: b d

 as.integer(splice(x,x))
[1] 1 1 1 1 2 2 2 2

I'd like to have a splice function that retains the level to label
mapping.  One candidate for a solution is:

splice - function(x,y) {
  xy - as.vector(rbind(x, y))
  if (is.factor(x)  is.factor(y))
xy - factor(xy, levels=1:length(levels(x)), labels=levels(x))
  xy
}

However, this relies on assumtions about the implementation of
factors that are neither mentioned nor guaranteed in the man page:
Levels are underlyingly integers starting from one and going to
length(levels).  levels(x) gives me the labels of these integers in an
order corresponding to 1:length(levels(x)).

Without these assumptions I see no way to recover the integer to level
name mapping for levels that are defined in a factor but do not occur.

I'd be happy if somebody could clarify this issue!

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Stavros Macrakis
Various people have provided technical solutions to your problem.

May I suggest, though, that 'splice' isn't quite the right word for this
operation?  Splicing two pieces of rope / movie film / audio tape / wires /
etc. means connecting them at their ends, either at an extremity or in the
middle, e.g.

X:  
Y:  
Extremity splice: xx  or
yyxx
Middle splice: xxxyyyx or
yyyxxx

The splice itself is the point of connection (xy or yx) between two things.

In normal English, splicing never refers to interspersing alternate members
of X and Y.

This may seem like a minor point, but I think it is worthwhile using
descriptive names for functions.

 -s


On Tue, Jun 9, 2009 at 5:12 AM, Titus von der Malsburg
malsb...@gmail.comwrote:

 An operation that I often need is splicing two vectors:

   splice(1:3, 4:6)
  [1] 1 4 2 5 3 6


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Titus von der Malsburg
On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote:
 This may seem like a minor point, but I think it is worthwhile using
 descriptive names for functions.

Makes sense.  I thought I've seen this use somewhere else (probably in
Lisp?).  What better name do you suggest for this operation?

  Titus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splicing factors without losing levels

2009-06-09 Thread Stavros Macrakis
On Tue, Jun 9, 2009 at 11:16 AM, Titus von der Malsburg
malsb...@gmail.comwrote:

 On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote:
  This may seem like a minor point, but I think it is worthwhile using
  descriptive names for functions.

 Makes sense.  I thought I've seen this use somewhere else (probably in
 Lisp?).  What better name do you suggest for this operation?


The two meanings I can think of in Lisp for splicing are

1) The backquote operator ,@X, which means to insert the value of X as part
of the surrounding list rather than as an element of the list, e.g.   `(a b
,@'(c d) e f) == (append '(a b) '(c d) '(e f)) =  (a b c d e f), as opposed
to `(a b ,'(c d) e f) == (append '(a b) (list '(c d)) '(e f)) = (a b (c d)
e f).

2) The notion of inserting (typically destructively) one list in the middle
of another.

I would suggest a name like 'intersperse' or 'alternate'.

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.