Re: [R] algorithm that iteratively drops columns of a data-frame

R. Michael Weylandt Wed, 09 Nov 2011 07:49:45 -0800

Perhaps attach placeholder names to your columns and use those rather
than indices?


Michael

On Wed, Nov 9, 2011 at 10:36 AM, Martin Batholdy
<batho...@googlemail.com> wrote:
> Dear R-Users,
>
>
> I have a problem with an algorithm that iteratively goes over a data.frame 
> and exclude n-columns each step based on a statistical criterion.
> So that the 'column-space' gets smaller and smaller with each iteration (like 
> when you do stepwise regression).
>
> The problem is that in every round I use a new subset of my data.frame.
>
> However, as soon as I "generate" this subset by indexing the data.frame I get 
> of course different column-numbers (compared to my original data-frame).
>
> How can I solve this?
>
>
>
> I prepared a small example to make my problem easier to understand:
>
>
> Here I generate a data.frame containing 6 vectors with different means.
>
> The loop now should exclude the vector with the smallest mean in each round.
>
> At the end I want to have a vector ('drop') which contains the column numbers 
> that I can apply on the original data.frame to get a subset with the highest 
> means.
>
> But the problem is that this is not working, since every time I generate a 
> subset ('data[,-drop]') I of course get now different column-numbers that 
> differ from the column-numbers of the original data-frame.
>
> So, in the end I can't use my drop-vector on my original data-frame – since 
> the dimension of the testing data-frame changes in every loop-round.
>
>
> How can I deal with this kind of problem?
>
> Any suggestions are highly appreciated!
> (of course for the example code, there are much easier method to achieve the 
> goal of finding the columns with the smallest means – It is a pretty generic 
> example)
>
>
> here is the sample code:
>
>
> x1 <- rnorm(200, 5, 2)
> x2 <- rnorm(200, 6, 2)
> x3 <- rnorm(200, 1, 2)
> x4 <- rnorm(200, 12, 2)
> x5 <- rnorm(200, 8, 2)
> x6 <- rnorm(200, 9, 2)
>
>
> data <- data.frame(x1, x2, x3, x4, x5,x6)
>
> col_means <- colMeans(data)
> drop <- match(min(col_means), col_means)
>
>
> for(i in 1:4) {
>
>        col_means <- colMeans(data[,-drop])
>        drop <- c(drop, match(min(col_means), col_means))
>
> }
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm that iteratively drops columns of a data-frame

Reply via email to