On Oct 4, 2013, at 2:35 PM, peter dalgaard <pda...@gmail.com> wrote:

> 
> On Oct 4, 2013, at 21:16 , Mary Kindall wrote:
> 
>> Y[Y < mean(Y)] = 0   #My edit
>> Y[Y >= mean(Y)] = 1  #My edit
> 
> I have no clue about gbm, but I don't think the above does what I think you 
> think it does. 
> 
> Y <- as.integer(Y >= mean(Y)) 
> 
> might be closer to the mark.


Good catch Peter! I didn't pay attention to that initially.

Here is an example:

set.seed(1)
Y <- rnorm(10)

> Y
 [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684
 [7]  0.4874291  0.7383247  0.5757814 -0.3053884

> mean(Y)
[1] 0.1322028

Before changing Y:

> Y[Y < mean(Y)]
[1] -0.6264538 -0.8356286 -0.8204684 -0.3053884

> Y[Y >= mean(Y)]
[1] 0.1836433 1.5952808 0.3295078 0.4874291 0.7383247 0.5757814


However, the incantation that Mary is using, which calculates mean(Y) 
separately in each call, results in:

Y[Y < mean(Y)]  = 0

> Y
 [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078 0.0000000
 [7] 0.4874291 0.7383247 0.5757814 0.0000000


# mean(Y) is no longer the original value from above
> mean(Y)
[1] 0.3909967


Thus:

Y[Y >= mean(Y)]  = 1

> Y
 [1] 0.0000000 0.1836433 0.0000000 1.0000000 0.3295078 0.0000000
 [7] 1.0000000 1.0000000 1.0000000 0.0000000


Some of the values in Y do not change because the threshold for modifying the 
values changed as a result of the recalculation of the mean after the first set 
of values in Y have changed. As Peter noted, you don't end up with a 
dichotomous vector.

Using Peter's method:

Y <- as.integer(Y >= mean(Y)) 
> Y
 [1] 0 1 0 1 1 0 1 1 1 0


That being said, the original viewpoint stands, which is to not do this due to 
loss of information.

Regards,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to