Re: [R] Help with replace()
On 12.07.2018 18:09, Bill Poling wrote: Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol! Using R for 20 years myself now I can only tell that it takes much longer. Best, Uwe Ligges Thank you David Min. 1st Qu. Median Mean 3rd Qu. Max. "1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31" WHP From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, July 12, 2018 11:29 AM To: Bill Poling Cc: r-help (r-help@r-project.org) Subject: Re: [R] Help with replace() On Jul 12, 2018, at 8:17 AM, Bill Poling mailto:bill.pol...@zelis.com>> wrote: R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) Hi. I have data set with day month year integers. I am creating a date column from those using lubridate. a hundred or so rows failed to parse. The problem is April and September have day = 31. paste(df1$year, df1$month, df1$day, sep = "-") ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial #The resulting Date vector can be added to df1 as a new column called date: df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning head(df1) sapply(df1$date,class) #"date" summary(df1$date) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) # year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 I am trying to replace those with 30. Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.) missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30 missing_dates year month day 3144 2000 9 30 3817 2000 4 30 3818 2000 4 30 3819 2000 4 30 3820 2000 4 30 3856 2000 9 30 Best; David. I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1> https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace> https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument> The following are screwy attempts at this simple repair, ??mutate_if ??replace is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) #year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 #So need those months with 30 days that are 31 to be 30 View(missing_dates) install.packages("dplyr") library(dplyr) View(missing_dates) # ..those were the values you're going to replace I thought this function from stackover would work, but get error when I try to add filter #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1> df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] return(.data_Frame) } df.Rep(missing_dates, 3, 31, 30) #--So I should be able to apply this to the complete df1 data somehow? head(df1) df.Rep(df1, filter(month == c(4,9)), 31, 30) #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types Other screwy attempts: select(df1, month, day, year) str(df1) #'data.frame': 34786 obs. of 14 variables: #To choose rows, use filter(): #mutate_if(df1, month =4,9), day = 30) filter(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day ==
Re: [R] Help with replace()
Yes, that's got it! (20 years from now I'll have it all figured out UGH!), lol! Thank you David Min. 1st Qu. Median Mean 3rd Qu. Max. "1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31" WHP From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, July 12, 2018 11:29 AM To: Bill Poling Cc: r-help (r-help@r-project.org) Subject: Re: [R] Help with replace() > On Jul 12, 2018, at 8:17 AM, Bill Poling > mailto:bill.pol...@zelis.com>> wrote: > > > R version 3.5.1 (2018-07-02) -- "Feather Spray" > Copyright (C) 2018 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Hi. > > I have data set with day month year integers. I am creating a date column > from those using lubridate. > > a hundred or so rows failed to parse. > > The problem is April and September have day = 31. > > paste(df1$year, df1$month, df1$day, sep = "-") > > ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 > failed to parse. As expected in tutorial > > #The resulting Date vector can be added to df1 as a new column called date: > df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning > > > head(df1) > sapply(df1$date,class) #"date" > summary(df1$date) > # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's > #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" > "2002-12-31" "129" > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > # year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > I am trying to replace those with 30. Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.) missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30 > missing_dates year month day 3144 2000 9 30 3817 2000 4 30 3818 2000 4 30 3819 2000 4 30 3820 2000 4 30 3856 2000 9 30 Best; David. > > I am all over the map in Google looking for a fix, but haven't found one. I > am sure I have over complicated my attempts with ideas(below) from these and > other sites. > > https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1> > https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace> > https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument> > The following are screwy attempts at this simple repair, > > ??mutate_if > > ??replace > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > #year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > #So need those months with 30 days that are 31 to be 30 > View(missing_dates) > > install.packages("dplyr") > library(dplyr) > > > View(missing_dates) > # ..those were the values you're going to replace > > I thought this function from stackover would work, but get error when I try > to add filter > > #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1> > df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ > .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, > .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, > .search_Columns] > return(.data_Frame) > } > > df.Rep(missing_dates, 3, 31, 30) > > #--So I should be able to apply this to the complete df1 data somehow? > head(df1) > df.Rep(df1, filter(month == c(4,9)), 31, 3
Re: [R] Help with replace()
> On Jul 12, 2018, at 8:17 AM, Bill Poling wrote: > > > R version 3.5.1 (2018-07-02) -- "Feather Spray" > Copyright (C) 2018 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Hi. > > I have data set with day month year integers. I am creating a date column > from those using lubridate. > > a hundred or so rows failed to parse. > > The problem is April and September have day = 31. > > paste(df1$year, df1$month, df1$day, sep = "-") > > ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 > failed to parse. As expected in tutorial > > #The resulting Date vector can be added to df1 as a new column called date: > df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning > > > head(df1) > sapply(df1$date,class) #"date" > summary(df1$date) > # Min. 1st Qu. Median Mean 3rd Qu. Max. > NA's > #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" > "2002-12-31""129" > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > # year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > I am trying to replace those with 30. Seems like a fairly straightforward application of "[<-" with a conditional argument. (No need for tidyverse.) missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )] <- 30 > missing_dates year month day 3144 2000 9 30 3817 2000 4 30 3818 2000 4 30 3819 2000 4 30 3820 2000 4 30 3856 2000 9 30 Best; David. > > I am all over the map in Google looking for a fix, but haven't found one. I > am sure I have over complicated my attempts with ideas(below) from these and > other sites. > > https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1 > https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace > https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument > The following are screwy attempts at this simple repair, > > ??mutate_if > > ??replace > > is_missing_date <- is.na(df1$date) > View(is_missing_date) > > date_columns <- c("year", "month", "day") > missing_dates <- df1[is_missing_date, date_columns] > > head(missing_dates) > #year month day > # 3144 2000 9 31 > # 3817 2000 4 31 > # 3818 2000 4 31 > # 3819 2000 4 31 > # 3820 2000 4 31 > # 3856 2000 9 31 > > #So need those months with 30 days that are 31 to be 30 > View(missing_dates) > > install.packages("dplyr") > library(dplyr) > > > View(missing_dates) > # ..those were the values you're going to replace > > I thought this function from stackover would work, but get error when I try > to add filter > > #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1 > df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ > .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, > .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, > .search_Columns] > return(.data_Frame) > } > > df.Rep(missing_dates, 3, 31, 30) > > #--So I should be able to apply this to the complete df1 data somehow? > head(df1) > df.Rep(df1, filter(month == c(4,9)), 31, 30) > #Error in month == c(4, 9) : comparison (1) is possible only for atomic > and list types > > > Other screwy attempts: > > > select(df1, month, day, year) > str(df1) > #'data.frame': 34786 obs. of 14 variables: > #To choose rows, use filter(): > > #mutate_if(df1, month =4,9), day = 30) > > > filter(df1, month == c(4,9), day == 31) > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > # 1 FALSE FALSE 31161 > # 2 FALSE TRUE 576 > # 3 TRUE FALSE2981 > # 4 TRUE TRUE 68 > > df1 %>% > mutate(day=replace(day, month == c(4,9), 30)) %>% > as.data.frame() > View(as.list(df1, month == 4)) > View(df1, month == c(4,9), day == 31) > > > df1 %>% > group_by(month == c(4,9), day == 31) %>% > tally() > View(df1, month == c(4,9)) > > # df1 %>% > # group_by(month == c(4,9), day == 30) %>% > > > I know there is a simple solution and it is driving me mad that it eludes > me, despite being new to R. > > Thank you for any advice. > > WHP > > > > > > > > > > > > > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}} > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help >
[R] Help with replace()
R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) Hi. I have data set with day month year integers. I am creating a date column from those using lubridate. a hundred or so rows failed to parse. The problem is April and September have day = 31. paste(df1$year, df1$month, df1$day, sep = "-") ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial #The resulting Date vector can be added to df1 as a new column called date: df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning head(df1) sapply(df1$date,class) #"date" summary(df1$date) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) # year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 I am trying to replace those with 30. I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1 https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument The following are screwy attempts at this simple repair, ??mutate_if ??replace is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) #year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 #So need those months with 30 days that are 31 to be 30 View(missing_dates) install.packages("dplyr") library(dplyr) View(missing_dates) # ..those were the values you're going to replace I thought this function from stackover would work, but get error when I try to add filter #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1=1 df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] return(.data_Frame) } df.Rep(missing_dates, 3, 31, 30) #--So I should be able to apply this to the complete df1 data somehow? head(df1) df.Rep(df1, filter(month == c(4,9)), 31, 30) #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types Other screwy attempts: select(df1, month, day, year) str(df1) #'data.frame': 34786 obs. of 14 variables: #To choose rows, use filter(): #mutate_if(df1, month =4,9), day = 30) filter(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() # 1 FALSE FALSE 31161 # 2 FALSE TRUE 576 # 3 TRUE FALSE2981 # 4 TRUE TRUE 68 df1 %>% mutate(day=replace(day, month == c(4,9), 30)) %>% as.data.frame() View(as.list(df1, month == 4)) View(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() View(df1, month == c(4,9)) # df1 %>% # group_by(month == c(4,9), day == 30) %>% I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R. Thank you for any advice. WHP Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help me replace a for loop with an apply function
Will this work: x - read.table(textConnection( day user_id + 2008/11/012001 + 2008/11/012002 + 2008/11/012003 + 2008/11/012004 + 2008/11/012005 + 2008/11/022001 + 2008/11/022005 + 2008/11/032001 + 2008/11/032003 + 2008/11/032004 + 2008/11/032005 + 2008/11/042001 + 2008/11/042003 + 2008/11/042004 + 2008/11/042005), header=TRUE) closeAllConnections() # convert to Date x$day - as.Date(x$day, format=%Y/%m/%d) # split by user and then look for contiguous days contig - sapply(split(x$day, x$user_id), function(.days){ + .diff - cumsum(c(TRUE, diff(.days) != 1)) + max(table(.diff)) + }) contig 2001 2002 2003 2004 2005 41224 On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote: ...if that is possible My task is to find the longest streak of continuous days a user participated in a game. Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results. The (attached) dataframe is something like this: day user_id 2008/11/01 2001 2008/11/01 2002 2008/11/01 2003 2008/11/01 2004 2008/11/01 2005 2008/11/02 2001 2008/11/02 2005 2008/11/03 2001 2008/11/03 2003 2008/11/03 2004 2008/11/03 2005 2008/11/04 2001 2008/11/04 2003 2008/11/04 2004 2008/11/04 2005 --- R code follows -- # turn it to a contingency table my_table - table(user_id, day) # get the streaks rle_table - apply(my_table,1,rle) # verify the longest streak of 1s for user 2001 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values, max)[1]) # loop to get the results # initiate results matrix res-matrix(nrow=dim(my_table)[1], ncol=2) for (i in 1:dim(my_table)[1]) { string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i], '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']), sep=) res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string))) } --- end of R code Unfortunately this for loop takes too long and I' wondering if there is a way to produce the res matrix using a function from the apply family. Thank you in advance -- View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help me replace a for loop with an apply function
Congratulations! Could you explain to me the reason you add an initial TRUE value in the cumulatice sum? jholtman wrote: Will this work: x - read.table(textConnection( day user_id + 2008/11/012001 + 2008/11/012002 + 2008/11/012003 + 2008/11/012004 + 2008/11/012005 + 2008/11/022001 + 2008/11/022005 + 2008/11/032001 + 2008/11/032003 + 2008/11/032004 + 2008/11/032005 + 2008/11/042001 + 2008/11/042003 + 2008/11/042004 + 2008/11/042005), header=TRUE) closeAllConnections() # convert to Date x$day - as.Date(x$day, format=%Y/%m/%d) # split by user and then look for contiguous days contig - sapply(split(x$day, x$user_id), function(.days){ + .diff - cumsum(c(TRUE, diff(.days) != 1)) + max(table(.diff)) + }) contig 2001 2002 2003 2004 2005 41224 On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote: ...if that is possible My task is to find the longest streak of continuous days a user participated in a game. Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results. The (attached) dataframe is something like this: day user_id 2008/11/01 2001 2008/11/01 2002 2008/11/01 2003 2008/11/01 2004 2008/11/01 2005 2008/11/02 2001 2008/11/02 2005 2008/11/03 2001 2008/11/03 2003 2008/11/03 2004 2008/11/03 2005 2008/11/04 2001 2008/11/04 2003 2008/11/04 2004 2008/11/04 2005 --- R code follows -- # turn it to a contingency table my_table - table(user_id, day) # get the streaks rle_table - apply(my_table,1,rle) # verify the longest streak of 1s for user 2001 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values, max)[1]) # loop to get the results # initiate results matrix res-matrix(nrow=dim(my_table)[1], ncol=2) for (i in 1:dim(my_table)[1]) { string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i], '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']), sep=) res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string))) } --- end of R code Unfortunately this for loop takes too long and I' wondering if there is a way to produce the res matrix using a function from the apply family. Thank you in advance -- View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help me replace a for loop with an apply function
What I am doing is trying to determine where the dates are not sequential (difference is not one day). Everytime that this occurs, the expression 'diff(.days) != 1' is TRUE and this is where a new sequence starts. 'diff' will return a vector one shorter than its input; I am assuming that the first date starts a sequence, so that is why the TRUE is the initial entry. Using 'cumsum' will generate a vector that has the same values for dates that are consecutive. By using table, you can determine what the maximum number of consecutive days are. HTH On Thu, Oct 1, 2009 at 2:57 PM, gd047 gd...@mineknowledge.com wrote: Congratulations! Could you explain to me the reason you add an initial TRUE value in the cumulatice sum? jholtman wrote: Will this work: x - read.table(textConnection( day user_id + 2008/11/01 2001 + 2008/11/01 2002 + 2008/11/01 2003 + 2008/11/01 2004 + 2008/11/01 2005 + 2008/11/02 2001 + 2008/11/02 2005 + 2008/11/03 2001 + 2008/11/03 2003 + 2008/11/03 2004 + 2008/11/03 2005 + 2008/11/04 2001 + 2008/11/04 2003 + 2008/11/04 2004 + 2008/11/04 2005), header=TRUE) closeAllConnections() # convert to Date x$day - as.Date(x$day, format=%Y/%m/%d) # split by user and then look for contiguous days contig - sapply(split(x$day, x$user_id), function(.days){ + .diff - cumsum(c(TRUE, diff(.days) != 1)) + max(table(.diff)) + }) contig 2001 2002 2003 2004 2005 4 1 2 2 4 On Thu, Oct 1, 2009 at 11:29 AM, gd047 gd...@mineknowledge.com wrote: ...if that is possible My task is to find the longest streak of continuous days a user participated in a game. Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results. The (attached) dataframe is something like this: day user_id 2008/11/01 2001 2008/11/01 2002 2008/11/01 2003 2008/11/01 2004 2008/11/01 2005 2008/11/02 2001 2008/11/02 2005 2008/11/03 2001 2008/11/03 2003 2008/11/03 2004 2008/11/03 2005 2008/11/04 2001 2008/11/04 2003 2008/11/04 2004 2008/11/04 2005 --- R code follows -- # turn it to a contingency table my_table - table(user_id, day) # get the streaks rle_table - apply(my_table,1,rle) # verify the longest streak of 1s for user 2001 # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values, max)[1]) # loop to get the results # initiate results matrix res-matrix(nrow=dim(my_table)[1], ncol=2) for (i in 1:dim(my_table)[1]) { string - paste(as.vector(tapply(rle_table$', rownames(my_table)[i], '$lengths, rle_table$', rownames(my_table)[i], '$values, max)['1']), sep=) res[i,]-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string))) } --- end of R code Unfortunately this for loop takes too long and I' wondering if there is a way to produce the res matrix using a function from the apply family. Thank you in advance -- View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.