Re: [R] Help with replace()

2018-07-14 Thread Uwe Ligges




On 12.07.2018 18:09, Bill Poling wrote:

Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol!


Using R for 20 years myself now I can only tell that it takes much longer.

Best,
Uwe Ligges



Thank you David

Min.  1st Qu.   Median Mean  3rd Qu. Max.
"1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31"

WHP




From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, July 12, 2018 11:29 AM
To: Bill Poling 
Cc: r-help (r-help@r-project.org) 
Subject: Re: [R] Help with replace()



On Jul 12, 2018, at 8:17 AM, Bill Poling 
mailto:bill.pol...@zelis.com>> wrote:


R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

Hi.

I have data set with day month year integers. I am creating a date column from 
those using lubridate.

a hundred or so rows failed to parse.

The problem is April and September have day = 31.

paste(df1$year, df1$month, df1$day, sep = "-")

ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed 
to parse. As expected in tutorial

#The resulting Date vector can be added to df1 as a new column called date:
df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning


head(df1)
sapply(df1$date,class) #"date"
summary(df1$date)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" 
"129"

is_missing_date <- is.na(df1$date)
View(is_missing_date)

date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date, date_columns]

head(missing_dates)
# year month day
# 3144 2000 9 31
# 3817 2000 4 31
# 3818 2000 4 31
# 3819 2000 4 31
# 3820 2000 4 31
# 3856 2000 9 31

I am trying to replace those with 30.


Seems like a fairly straightforward application of "[<-" with a conditional 
argument. (No need for tidyverse.)

missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] 
<- 30



missing_dates

year month day
3144 2000 9 30
3817 2000 4 30
3818 2000 4 30
3819 2000 4 30
3820 2000 4 30
3856 2000 9 30

Best;
David.



I am all over the map in Google looking for a fix, but haven't found one. I am 
sure I have over complicated my attempts with ideas(below) from these and other 
sites.

https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1>
https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace>
https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument>
The following are screwy attempts at this simple repair,

??mutate_if

??replace

is_missing_date <- is.na(df1$date)
View(is_missing_date)

date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date, date_columns]

head(missing_dates)
#year month day
# 3144 2000 9 31
# 3817 2000 4 31
# 3818 2000 4 31
# 3819 2000 4 31
# 3820 2000 4 31
# 3856 2000 9 31

#So need those months with 30 days that are 31 to be 30
View(missing_dates)

install.packages("dplyr")
library(dplyr)


View(missing_dates)
# ..those were the values you're going to replace

I thought this function from stackover would work, but get error when I try to 
add filter

#https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1>
df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
.data_Frame[, .search_Columns] <- ifelse(.data_Frame[, 
.search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, 
.search_Columns]
return(.data_Frame)
}

df.Rep(missing_dates, 3, 31, 30)

#--So I should be able to apply this to the complete df1 data somehow?
head(df1)
df.Rep(df1, filter(month == c(4,9)), 31, 30)
#Error in month == c(4, 9) : comparison (1) is possible only for atomic and 
list types


Other screwy attempts:


select(df1, month, day, year)
str(df1)
#'data.frame': 34786 obs. of 14 variables:
#To choose rows, use filter():

#mutate_if(df1, month =4,9), day = 30)


filter(df1, month == c(4,9), day == 31)

df1 %>%
group_by(month == c(4,9), day == 

Re: [R] Help with replace()

2018-07-12 Thread Bill Poling
Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol!

Thank you David

Min.  1st Qu.   Median Mean  3rd Qu. Max.
"1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31"

WHP




From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, July 12, 2018 11:29 AM
To: Bill Poling 
Cc: r-help (r-help@r-project.org) 
Subject: Re: [R] Help with replace()


> On Jul 12, 2018, at 8:17 AM, Bill Poling 
> mailto:bill.pol...@zelis.com>> wrote:
>
>
> R version 3.5.1 (2018-07-02) -- "Feather Spray"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> Hi.
>
> I have data set with day month year integers. I am creating a date column 
> from those using lubridate.
>
> a hundred or so rows failed to parse.
>
> The problem is April and September have day = 31.
>
> paste(df1$year, df1$month, df1$day, sep = "-")
>
> ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 
> failed to parse. As expected in tutorial
>
> #The resulting Date vector can be added to df1 as a new column called date:
> df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
>
>
> head(df1)
> sapply(df1$date,class) #"date"
> summary(df1$date)
> # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" 
> "2002-12-31" "129"
>
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
>
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date, date_columns]
>
> head(missing_dates)
> # year month day
> # 3144 2000 9 31
> # 3817 2000 4 31
> # 3818 2000 4 31
> # 3819 2000 4 31
> # 3820 2000 4 31
> # 3856 2000 9 31
>
> I am trying to replace those with 30.

Seems like a fairly straightforward application of "[<-" with a conditional 
argument. (No need for tidyverse.)

missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] 
<- 30


> missing_dates
year month day
3144 2000 9 30
3817 2000 4 30
3818 2000 4 30
3819 2000 4 30
3820 2000 4 30
3856 2000 9 30

Best;
David.

>
> I am all over the map in Google looking for a fix, but haven't found one. I 
> am sure I have over complicated my attempts with ideas(below) from these and 
> other sites.
>
> https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1>
> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace>
> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument>
> The following are screwy attempts at this simple repair,
>
> ??mutate_if
>
> ??replace
>
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
>
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date, date_columns]
>
> head(missing_dates)
> #year month day
> # 3144 2000 9 31
> # 3817 2000 4 31
> # 3818 2000 4 31
> # 3819 2000 4 31
> # 3820 2000 4 31
> # 3856 2000 9 31
>
> #So need those months with 30 days that are 31 to be 30
> View(missing_dates)
>
> install.packages("dplyr")
> library(dplyr)
>
>
> View(missing_dates)
> # ..those were the values you're going to replace
>
> I thought this function from stackover would work, but get error when I try 
> to add filter
>
> #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1>
> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
> .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, 
> .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, 
> .search_Columns]
> return(.data_Frame)
> }
>
> df.Rep(missing_dates, 3, 31, 30)
>
> #--So I should be able to apply this to the complete df1 data somehow?
> head(df1)
> df.Rep(df1, filter(month == c(4,9)), 31, 3

Re: [R] Help with replace()

2018-07-12 Thread David Winsemius


> On Jul 12, 2018, at 8:17 AM, Bill Poling  wrote:
> 
> 
> R version 3.5.1 (2018-07-02) -- "Feather Spray"
> Copyright (C) 2018 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> Hi.
> 
> I have data set with day month year integers. I am creating a date column 
> from those using lubridate.
> 
> a hundred or so rows failed to parse.
> 
> The problem is April and September have day = 31.
> 
> paste(df1$year, df1$month, df1$day, sep = "-")
> 
> ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 
> failed to parse. As expected in tutorial
> 
> #The resulting Date vector can be added to df1 as a new column called date:
> df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
> 
> 
> head(df1)
> sapply(df1$date,class) #"date"
> summary(df1$date)
> # Min.  1st Qu.   Median Mean  3rd Qu. Max.   
>   NA's
> #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" 
> "2002-12-31""129"
> 
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
> 
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date,  date_columns]
> 
> head(missing_dates)
> #  year month day
> # 3144 2000 9  31
> # 3817 2000 4  31
> # 3818 2000 4  31
> # 3819 2000 4  31
> # 3820 2000 4  31
> # 3856 2000 9  31
> 
> I am trying to replace those with 30.

Seems like a fairly straightforward application of "[<-" with a conditional 
argument. (No need for tidyverse.)

 missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) 
)] <- 30


> missing_dates
 year month day
3144 2000 9  30
3817 2000 4  30
3818 2000 4  30
3819 2000 4  30
3820 2000 4  30
3856 2000 9  30

Best;
David.

> 
> I am all over the map in Google looking for a fix, but haven't found one. I 
> am sure I have over complicated my attempts with ideas(below) from these and 
> other sites.
> 
> https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1
> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace
> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument
> The following are screwy attempts at this simple repair,
> 
> ??mutate_if
> 
> ??replace
> 
> is_missing_date <- is.na(df1$date)
> View(is_missing_date)
> 
> date_columns <- c("year", "month", "day")
> missing_dates <- df1[is_missing_date,  date_columns]
> 
> head(missing_dates)
> #year month day
> # 3144 2000 9  31
> # 3817 2000 4  31
> # 3818 2000 4  31
> # 3819 2000 4  31
> # 3820 2000 4  31
> # 3856 2000 9  31
> 
> #So need those months with 30 days that are 31 to be 30
> View(missing_dates)
> 
> install.packages("dplyr")
> library(dplyr)
> 
> 
> View(missing_dates)
> # ..those were the values you're going to replace
> 
> I thought this function from stackover would work, but get error when I try 
> to add filter
> 
> #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1
> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
>  .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, 
> .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, 
> .search_Columns]
>  return(.data_Frame)
> }
> 
> df.Rep(missing_dates, 3, 31, 30)
> 
> #--So I should be able to apply this to the complete df1 data somehow?
> head(df1)
> df.Rep(df1, filter(month == c(4,9)), 31, 30)
> #Error in month == c(4, 9)  :   comparison (1) is possible only for atomic 
> and list types
> 
> 
> Other screwy attempts:
> 
> 
> select(df1, month, day, year)
> str(df1)
> #'data.frame':   34786 obs. of  14 variables:
> #To choose rows, use filter():
> 
> #mutate_if(df1, month =4,9), day = 30)
> 
> 
> filter(df1, month == c(4,9), day == 31)
> 
> df1 %>%
>  group_by(month == c(4,9), day == 31) %>%
>  tally()
> # 1 FALSE  FALSE   31161
> # 2 FALSE  TRUE  576
> # 3 TRUE   FALSE2981
> # 4 TRUE   TRUE   68
> 
>  df1 %>%
>  mutate(day=replace(day, month == c(4,9), 30)) %>%
>  as.data.frame()
>  View(as.list(df1, month == 4))
>  View(df1, month == c(4,9), day == 31)
> 
> 
> df1 %>%
>  group_by(month == c(4,9), day == 31) %>%
>  tally()
> View(df1, month == c(4,9))
> 
> # df1 %>%
> #   group_by(month == c(4,9), day == 30) %>%
> 
> 
> I know there is a simple solution  and it is driving me mad that it eludes 
> me, despite being new to R.
> 
> Thank you for any advice.
> 
> WHP
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> 

[R] Help with replace()

2018-07-12 Thread Bill Poling


R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

Hi.

I have data set with day month year integers. I am creating a date column from 
those using lubridate.

a hundred or so rows failed to parse.

The problem is April and September have day = 31.

paste(df1$year, df1$month, df1$day, sep = "-")

ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed 
to parse. As expected in tutorial

#The resulting Date vector can be added to df1 as a new column called date:
df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning


head(df1)
sapply(df1$date,class) #"date"
summary(df1$date)
# Min.  1st Qu.   Median Mean  3rd Qu. Max. 
NA's
#"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31"  
  "129"

is_missing_date <- is.na(df1$date)
View(is_missing_date)

date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date,  date_columns]

head(missing_dates)
#  year month day
# 3144 2000 9  31
# 3817 2000 4  31
# 3818 2000 4  31
# 3819 2000 4  31
# 3820 2000 4  31
# 3856 2000 9  31

I am trying to replace those with 30.

I am all over the map in Google looking for a fix, but haven't found one. I am 
sure I have over complicated my attempts with ideas(below) from these and other 
sites.

https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1
https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace
https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument
The following are screwy attempts at this simple repair,

??mutate_if

??replace

is_missing_date <- is.na(df1$date)
View(is_missing_date)

date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date,  date_columns]

head(missing_dates)
#year month day
# 3144 2000 9  31
# 3817 2000 4  31
# 3818 2000 4  31
# 3819 2000 4  31
# 3820 2000 4  31
# 3856 2000 9  31

#So need those months with 30 days that are 31 to be 30
View(missing_dates)

install.packages("dplyr")
library(dplyr)


View(missing_dates)
# ..those were the values you're going to replace

I thought this function from stackover would work, but get error when I try to 
add filter

#https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1
df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
  .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, 
.search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, 
.search_Columns]
  return(.data_Frame)
}

df.Rep(missing_dates, 3, 31, 30)

#--So I should be able to apply this to the complete df1 data somehow?
head(df1)
df.Rep(df1, filter(month == c(4,9)), 31, 30)
#Error in month == c(4, 9)  :   comparison (1) is possible only for atomic and 
list types


Other screwy attempts:


select(df1, month, day, year)
str(df1)
#'data.frame':   34786 obs. of  14 variables:
#To choose rows, use filter():

#mutate_if(df1, month =4,9), day = 30)


filter(df1, month == c(4,9), day == 31)

df1 %>%
  group_by(month == c(4,9), day == 31) %>%
  tally()
# 1 FALSE  FALSE   31161
# 2 FALSE  TRUE  576
# 3 TRUE   FALSE2981
# 4 TRUE   TRUE   68

  df1 %>%
  mutate(day=replace(day, month == c(4,9), 30)) %>%
  as.data.frame()
  View(as.list(df1, month == 4))
  View(df1, month == c(4,9), day == 31)


df1 %>%
  group_by(month == c(4,9), day == 31) %>%
  tally()
View(df1, month == c(4,9))

# df1 %>%
#   group_by(month == c(4,9), day == 30) %>%


I know there is a simple solution  and it is driving me mad that it eludes me, 
despite being new to R.

Thank you for any advice.

WHP





















Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help me replace a for loop with an apply function

2009-10-01 Thread jim holtman
Will this work:

 x - read.table(textConnection(   day user_id
+ 2008/11/012001
+ 2008/11/012002
+ 2008/11/012003
+ 2008/11/012004
+ 2008/11/012005
+ 2008/11/022001
+ 2008/11/022005
+ 2008/11/032001
+ 2008/11/032003
+ 2008/11/032004
+ 2008/11/032005
+ 2008/11/042001
+ 2008/11/042003
+ 2008/11/042004
+ 2008/11/042005), header=TRUE)
 closeAllConnections()
 # convert to Date
 x$day - as.Date(x$day, format=%Y/%m/%d)
 # split by user and then look for contiguous days
 contig - sapply(split(x$day, x$user_id), function(.days){
+ .diff - cumsum(c(TRUE, diff(.days) != 1))
+ max(table(.diff))
+ })
 contig
2001 2002 2003 2004 2005
   41224




On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote:

 ...if that is possible

 My task is to find the longest streak of continuous days a user participated
 in a game.

 Instead of writing an sql function, I chose to use the R's rle function, to
 get the longest streaks and then update my db table with the results.

 The (attached) dataframe is something like this:

    day         user_id
 2008/11/01    2001
 2008/11/01    2002
 2008/11/01    2003
 2008/11/01    2004
 2008/11/01    2005
 2008/11/02    2001
 2008/11/02    2005
 2008/11/03    2001
 2008/11/03    2003
 2008/11/03    2004
 2008/11/03    2005
 2008/11/04    2001
 2008/11/04    2003
 2008/11/04    2004
 2008/11/04    2005



 --- R code follows
 --


 # turn it to a contingency table
 my_table - table(user_id, day)

 # get the streaks
 rle_table - apply(my_table,1,rle)

 # verify the longest streak of 1s for user 2001
 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values,
 max)[1])

 # loop to get the results
 # initiate results matrix
 res-matrix(nrow=dim(my_table)[1], ncol=2)

 for (i in 1:dim(my_table)[1]) {
 string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i],
 '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']),
 sep=)
 res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
 }


 
 --- end of R code

 Unfortunately this for loop takes too long and I' wondering if there is a
 way to produce the res matrix using a function from the apply family.

 Thank you in advance
 --
 View this message in context: 
 http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help me replace a for loop with an apply function

2009-10-01 Thread gd047

Congratulations!

Could you explain to me the reason you add an initial TRUE value in the
cumulatice sum?



jholtman wrote:
 
 Will this work:
 
 x - read.table(textConnection(   day user_id
 + 2008/11/012001
 + 2008/11/012002
 + 2008/11/012003
 + 2008/11/012004
 + 2008/11/012005
 + 2008/11/022001
 + 2008/11/022005
 + 2008/11/032001
 + 2008/11/032003
 + 2008/11/032004
 + 2008/11/032005
 + 2008/11/042001
 + 2008/11/042003
 + 2008/11/042004
 + 2008/11/042005), header=TRUE)
 closeAllConnections()
 # convert to Date
 x$day - as.Date(x$day, format=%Y/%m/%d)
 # split by user and then look for contiguous days
 contig - sapply(split(x$day, x$user_id), function(.days){
 + .diff - cumsum(c(TRUE, diff(.days) != 1))
 + max(table(.diff))
 + })
 contig
 2001 2002 2003 2004 2005
41224


 
 
 On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote:

 ...if that is possible

 My task is to find the longest streak of continuous days a user
 participated
 in a game.

 Instead of writing an sql function, I chose to use the R's rle function,
 to
 get the longest streaks and then update my db table with the results.

 The (attached) dataframe is something like this:

    day         user_id
 2008/11/01    2001
 2008/11/01    2002
 2008/11/01    2003
 2008/11/01    2004
 2008/11/01    2005
 2008/11/02    2001
 2008/11/02    2005
 2008/11/03    2001
 2008/11/03    2003
 2008/11/03    2004
 2008/11/03    2005
 2008/11/04    2001
 2008/11/04    2003
 2008/11/04    2004
 2008/11/04    2005



 --- R code follows
 --


 # turn it to a contingency table
 my_table - table(user_id, day)

 # get the streaks
 rle_table - apply(my_table,1,rle)

 # verify the longest streak of 1s for user 2001
 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values,
 max)[1])

 # loop to get the results
 # initiate results matrix
 res-matrix(nrow=dim(my_table)[1], ncol=2)

 for (i in 1:dim(my_table)[1]) {
 string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i],
 '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']),
 sep=)
 res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
 }


 
 --- end of R code

 Unfortunately this for loop takes too long and I' wondering if there is a
 way to produce the res matrix using a function from the apply family.

 Thank you in advance
 --
 View this message in context:
 http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help me replace a for loop with an apply function

2009-10-01 Thread jim holtman
What I am doing is trying to determine where the dates are not
sequential (difference is not one day).  Everytime that this occurs,
the expression 'diff(.days) != 1' is TRUE and this is where a new
sequence starts.  'diff' will return a vector one shorter than its
input; I am assuming that the first date starts a sequence, so that is
why the TRUE is the initial entry.  Using 'cumsum' will generate a
vector that has the same values for dates that are consecutive.  By
using table, you can determine what the maximum number of consecutive
days are.

HTH

On Thu, Oct 1, 2009 at 2:57 PM, gd047 gd...@mineknowledge.com wrote:

 Congratulations!

 Could you explain to me the reason you add an initial TRUE value in the
 cumulatice sum?



 jholtman wrote:

 Will this work:

 x - read.table(textConnection(   day         user_id
 + 2008/11/01    2001
 + 2008/11/01    2002
 + 2008/11/01    2003
 + 2008/11/01    2004
 + 2008/11/01    2005
 + 2008/11/02    2001
 + 2008/11/02    2005
 + 2008/11/03    2001
 + 2008/11/03    2003
 + 2008/11/03    2004
 + 2008/11/03    2005
 + 2008/11/04    2001
 + 2008/11/04    2003
 + 2008/11/04    2004
 + 2008/11/04    2005), header=TRUE)
 closeAllConnections()
 # convert to Date
 x$day - as.Date(x$day, format=%Y/%m/%d)
 # split by user and then look for contiguous days
 contig - sapply(split(x$day, x$user_id), function(.days){
 +     .diff - cumsum(c(TRUE, diff(.days) != 1))
 +     max(table(.diff))
 + })
 contig
 2001 2002 2003 2004 2005
    4    1    2    2    4




 On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote:

 ...if that is possible

 My task is to find the longest streak of continuous days a user
 participated
 in a game.

 Instead of writing an sql function, I chose to use the R's rle function,
 to
 get the longest streaks and then update my db table with the results.

 The (attached) dataframe is something like this:

    day         user_id
 2008/11/01    2001
 2008/11/01    2002
 2008/11/01    2003
 2008/11/01    2004
 2008/11/01    2005
 2008/11/02    2001
 2008/11/02    2005
 2008/11/03    2001
 2008/11/03    2003
 2008/11/03    2004
 2008/11/03    2005
 2008/11/04    2001
 2008/11/04    2003
 2008/11/04    2004
 2008/11/04    2005



 --- R code follows
 --


 # turn it to a contingency table
 my_table - table(user_id, day)

 # get the streaks
 rle_table - apply(my_table,1,rle)

 # verify the longest streak of 1s for user 2001
 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values,
 max)[1])

 # loop to get the results
 # initiate results matrix
 res-matrix(nrow=dim(my_table)[1], ncol=2)

 for (i in 1:dim(my_table)[1]) {
 string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i],
 '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']),
 sep=)
 res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
 }


 
 --- end of R code

 Unfortunately this for loop takes too long and I' wondering if there is a
 way to produce the res matrix using a function from the apply family.

 Thank you in advance
 --
 View this message in context:
 http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context: 
 http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.