Here is the solution using pmin/pmax for 10,000 rows.

> min_pctle_cut <- 0.01
> max_pctle_cut <- 0.99
> library(outliers)
>
> n <- 10000
> x1 <- runif(n)
> x2 <- runif(n)
> x3 <- x1 + x2 + runif(n)/10
> x4 <- x1 + x2 + x3 + runif(n)/10
> x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
> x6 <- factor(1*(x5=='a' | x5=='c'))
> data1 <- cbind(x1,x2,x3,x4,x5,x6)
> x <- data.frame(data1)
>
> z <- x[,sapply(x,is.numeric)]
> zNew <- z  # save for 2nd test
>
> qs <- sapply(z, function(z) quantile(z,
+        c(min_pctle_cut, max_pctle_cut), na.rm = TRUE))
>
>
> #Loop below taking time for execution
>
> system.time(for (i in 1:ncol(z))
+ {
+        for (j in 1:nrow(z))
+ {
+ if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i]
+ if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i]
+ }
+ })
   user  system elapsed
   6.64    0.00    7.76
>
> system.time({
+     for (i in 1:ncol(z)) zNew[[i]] <- pmax(qs[1,i], pmin(qs[2,i], z[[i]]))
+ })
   user  system elapsed
   0.02    0.00    0.00
>
> all(z == zNew)  # are they the same?
[1] TRUE
>


On Tue, Nov 22, 2011 at 6:24 AM, Jim Holtman <jholt...@gmail.com> wrote:
> You can easily vectorize this code using pmin/pmax.
>
> Sent from my iPad
>
> On Nov 22, 2011, at 1:06, Aher <ajit.a...@cedar-consulting.com> wrote:
>
>> Hi Experts,
>>
>> I am new to R, using following sample code for capping outliers using
>> percentile information.  Working on large data (30000 observations and 150
>> variables), loop I am using in the below mentioned code for detecting
>> outliers and capping to upper /lower percentile value is taking much time
>> for the execution.
>> Is there anything wrong with code, can anyone suggest improvement in the
>> script to enhance performance!
>> min_pctle_cut <- 0.01
>> max_pctle_cut <- 0.99
>> library(outliers)
>>
>> n <- 100
>> x1 <- runif(n)
>> x2 <- runif(n)
>> x3 <- x1 + x2 + runif(n)/10
>> x4 <- x1 + x2 + x3 + runif(n)/10
>> x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
>> x6 <- factor(1*(x5=='a' | x5=='c'))
>> data1 <- cbind(x1,x2,x3,x4,x5,x6)
>> x <- data.frame(data1)
>>
>> z <- x[,sapply(x,is.numeric)]
>>
>> qs <- sapply(z, function(z) quantile(z,
>>    c(min_pctle_cut, max_pctle_cut), na.rm = TRUE))
>>
>>
>> #Loop below taking time for execution
>>
>> system.time(for (i in 1:ncol(z))
>> {
>>    for (j in 1:nrow(z))
>> {
>> if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i]
>> if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i]
>> }
>> })
>>
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Capping-outliers-tp4094647p4094647.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to