Re: [R] Failure to aggregate
On Wed, Sep 8, 2010 at 4:48 AM, Dimitri Shvorob dimitri.shvo...@gmail.com wrote: I was able to aggregate (with sqldf, at least), after saving and re-loading the dataframe. My first guess was that h (and/or price?) now being a factor - stringsAsFactors = T by default - made the difference, and I tried to convert x$h to factor, but received an error. Please provide enough of x to reproduce your problem. e.g. x - head(x) dput(x) # repeat code and ensure it still shows problem -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
Hi you has to provide some more info about x e.g. str(x) x-data.frame(price=1, h=Sys.time()) r-help-boun...@r-project.org napsal dne 08.09.2010 10:18:52: Mnay thanks fr suggestions. I am afraid this is one tough daatframe... t = sqldf(select h, count(*) from x group by h) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' did not test t = aggregate(x[price], by = x[h], FUN = NROW) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? works aggregate(x[price], by = x[h], FUN = NROW) h price 1 2010-09-09 16:58:04 1 t = aggregate(x[price], by = x[h], FUN = length) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? works aggregate(x[price], by = x[h], FUN = length) h price 1 2010-09-09 16:58:04 1 t = tapply(x$price, by = x$h, FUN = length) Error in is.list(INDEX) : 'INDEX' is missing works use INDEX instead of by tapply(x$price, by = list(x$h), FUN = length) Error in is.list(INDEX) : 'INDEX' is missing tapply(x$price, x$h, FUN = length) 2010-09-09 16:58:04 1 Regards Petr class(x) [1] data.frame class(x$h) [1] POSIXt POSIXlt class(x$price) [1] integer -- View this message in context: http://r.789695.n4.nabble.com/Failure-to- aggregate-tp2528613p2530963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
g = head(x) dput(g) structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, R, R, R, R, R), t = structure(list(sec = c(24.133, 47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L )), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L )), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt))), .Names = c(price, size, src, t, d, h, m, s), row.names = c(NA, 6L), class = data.frame) n = sqldf(select distinct h, src, count(*) from g group by h, src) Loading required package: tcltk Loading Tcl/Tk interface ... done Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: g) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
I think your main problem is that you have your time as POSIXlt which is a multiple valued vector. I converted the 't' to POSIXct, removed the other POSIXlt value and created a 'h' as the character for the hour and it works fine: str(g) 'data.frame': 6 obs. of 5 variables: $ price: int 500 500 501 501 500 501 $ size : int 221000 2000 1000 13000 3000 3000 $ src : chr R R R R ... $ time : POSIXct, format: 2005-01-04 09:00:24 2005-01-04 09:00:47 2005-01-04 09:01:12 2005-01-04 09:01:18 ... $ h: chr 09 09 09 09 ... sqldf(select distinct h, src, count(*) from g group by h,src) h src count(*) 1 09 R6 On Thu, Sep 9, 2010 at 11:16 AM, Dimitri Shvorob dimitri.shvo...@gmail.com wrote: g = head(x) dput(g) structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, R, R, R, R, R), t = structure(list(sec = c(24.133, 47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L )), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L )), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt))), .Names = c(price, size, src, t, d, h, m, s), row.names = c(NA, 6L), class = data.frame) n = sqldf(select distinct h, src, count(*) from g group by h, src) Loading required package: tcltk Loading Tcl/Tk interface ... done Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: g) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
Thanks a lot, Jim. I am not sure what difference the various POSIXes make - in the end, you are replacing a datetime hour with a numeric value, e.g., 1 or 9. That does not work for me, unfortunately. g = head(x) dput(g) structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, R, R, R, R, R), t = structure(c(1104829224.133, 1104829247.096, 1104829272.139, 1104829278.142, 1104829330.721, 1104829348.713), class = c(POSIXt, POSIXct), tzone = ), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), h = c(9, 9, 9, 9, 9, 9), m = structure(list( sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst), class = c(POSIXt, POSIXlt)), s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst ), class = c(POSIXt, POSIXlt))), .Names = c(price, size, src, t, d, h, m, s), row.names = c(NA, 6L ), class = data.frame) n = sqldf(select distinct h, src, count(*) from g group by h, src) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: g) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533189.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
Update: What did make a difference for me - and something that was present in Jim's example, but not reproduced by myself initially - was dropping columns other than the two involved. When I dropped all columns except for h and src, the sqldf call worked. ... Is it an R bug or what? (I am saying R, not sqldf, because built-in tapply did not work either). Arrgh... -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533206.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
On Thu, Sep 9, 2010 at 12:39 PM, Dimitri Shvorob dimitri.shvo...@gmail.com wrote: Update: What did make a difference for me - and something that was present in Jim's example, but not reproduced by myself initially - was dropping columns other than the two involved. When I dropped all columns except for h and src, the sqldf call worked. ... Is it an R bug or what? (I am saying R, not sqldf, because built-in tapply did not work either). Arrgh... You need to show what you have done in a reproducible fashion to get answers. Converting to POSIXct does work. g below is the object you posted earlier in this thread. I have also removed the erroneous distinct although it will work even if you leave it in. g - + structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), +size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, +R, R, R, R, R), t = structure(list(sec = c(24.133, +47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L, +1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, +4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, +105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, +2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, +0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, +mon, year, wday, yday, isdst), class = c(POSIXt, +POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), +min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, +0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, +0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, +105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, +3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L +)), .Names = c(sec, min, hour, mday, mon, year, +wday, yday, isdst), class = c(POSIXt, POSIXlt)), +h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, +0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, +4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, +105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, +2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, +0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, +mon, year, wday, yday, isdst), class = c(POSIXt, +POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0), +min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, +9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, +0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, +105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, +3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L +)), .Names = c(sec, min, hour, mday, mon, year, +wday, yday, isdst), class = c(POSIXt, POSIXlt)), +s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, +0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, +4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, +105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, +2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, +0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, +mon, year, wday, yday, isdst), class = c(POSIXt, +POSIXlt))), .Names = c(price, size, src, t, d, + h, m, s), row.names = c(NA, 6L), class = data.frame) library(sqldf) # convert POSIXlt columns to POSIXct ix - sapply(g, inherits, POSIXlt) g[ix] - lapply(g[ix], as.POSIXct) n - sqldf(select h, src, count(*) from g group by h, src) n h src count(*) 1 2005-01-04 09:00:00 R6 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
Ok, conversion to POSIXct does the trick - why doesn't tapply work with the other, not-obviously-improper POSIX type? (Incidentally, now it gives me more trouble, with sorting - a reproducible sample coming up in another thread). -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533311.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
Mnay thanks fr suggestions. I am afraid this is one tough daatframe... t = sqldf(select h, count(*) from x group by h) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' t = aggregate(x[price], by = x[h], FUN = NROW) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? t = aggregate(x[price], by = x[h], FUN = length) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? t = tapply(x$price, by = x$h, FUN = length) Error in is.list(INDEX) : 'INDEX' is missing class(x) [1] data.frame class(x$h) [1] POSIXt POSIXlt class(x$price) [1] integer -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2530963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
I was able to aggregate (with sqldf, at least), after saving and re-loading the dataframe. My first guess was that h (and/or price?) now being a factor - stringsAsFactors = T by default - made the difference, and I tried to convert x$h to factor, but received an error. -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2530989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Failure to aggregate
I have a (very big - 1.5 rows) dataframe with a (POSIXt POSIXlt) column h (hour). Surprisingly, I cannot calculate a simple aggregate over the dataframe. n.h1 = sqldf(select distinct h, count(*) from x group by h) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' n.h2 = aggregate(x$price, by = x$h, FUN = nrow) Error in names(y) - c(names(by), names(x)) : 'names' attribute [10] must be the same length as the vector [2] Arrgh... -- View this message in context: http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2528613.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
On Sep 6, 2010, at 12:15 PM, Dimitri Shvorob wrote: I have a (very big - 1.5 rows) dataframe with a (POSIXt POSIXlt) column h (hour). Surprisingly, I cannot calculate a simple aggregate over the dataframe. n.h1 = sqldf(select distinct h, count(*) from x group by h) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' n.h2 = aggregate(x$price, by = x$h, FUN = nrow) A vector argument (x$price) would only have one row (at most). nrow(c(1,2) NULL Error in names(y) - c(names(by), names(x)) : 'names' attribute [10] must be the same length as the vector [2] Try: tapply(x$price, by = x$h, FUN = length) -- David. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Failure to aggregate
On Mon, Sep 6, 2010 at 12:15 PM, Dimitri Shvorob dimitri.shvo...@gmail.com wrote: I have a (very big - 1.5 rows) dataframe with a (POSIXt POSIXlt) column h (hour). Surprisingly, I cannot calculate a simple aggregate over the dataframe. n.h1 = sqldf(select distinct h, count(*) from x group by h) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: x) In addition: Warning message: In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double' n.h2 = aggregate(x$price, by = x$h, FUN = nrow) Error in names(y) - c(names(by), names(x)) : 'names' attribute [10] must be the same length as the vector [2] Since you are using group by you don't want distinct. In aggregate use x[price] and x[h] rather than x$price and x$h or use a formula. Also use nrow or length in place of NROW. library(sqldf) x - data.frame(price = 1:4, h = c(1, 1, 2, 3)) sqldf(select h, count(*) from x group by h) aggregate(x[price], by = x[h], FUN = NROW) aggregate(x[price], by = x[h], FUN = length) aggregate(price ~ h, x, FUN = length) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.