Re: [R] Failure to aggregate

2010-09-09 Thread Gabor Grothendieck
On Wed, Sep 8, 2010 at 4:48 AM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 I was able to aggregate (with sqldf, at least), after saving and re-loading
 the dataframe. My first guess was that h (and/or price?) now being a factor
 - stringsAsFactors = T by default - made the difference, and I tried to
 convert x$h to factor, but received an error.

Please provide enough of x to reproduce your problem.  e.g.

x - head(x)
dput(x)
# repeat code and ensure it still shows problem

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Petr PIKAL
Hi

you has to provide some more info about x e.g. str(x)

x-data.frame(price=1, h=Sys.time())



r-help-boun...@r-project.org napsal dne 08.09.2010 10:18:52:

 
 Mnay thanks fr suggestions. I am afraid this is one tough daatframe...
 
  t = sqldf(select h, count(*) from x group by h)
 Error in sqliteExecStatement(con, statement, bind.data) : 
   RS-DBI driver: (error in statement: no such table: x)
 In addition: Warning message:
 In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 
'double'

did not test

 
  t = aggregate(x[price], by = x[h], FUN = NROW) 
 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

works
 aggregate(x[price], by = x[h], FUN = NROW) 
h price
1 2010-09-09 16:58:04 1


 
  t = aggregate(x[price], by = x[h], FUN = length) 
 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

works

aggregate(x[price], by = x[h], FUN = length)
h price
1 2010-09-09 16:58:04 1


 
  t = tapply(x$price, by = x$h, FUN = length) 
 Error in is.list(INDEX) : 'INDEX' is missing

works
use INDEX instead of by

tapply(x$price, by = list(x$h), FUN = length)
Error in is.list(INDEX) : 'INDEX' is missing
tapply(x$price, x$h, FUN = length)
2010-09-09 16:58:04 
  1

Regards
Petr

 
  class(x)
 [1] data.frame
  class(x$h)
 [1] POSIXt  POSIXlt
  class(x$price)
 [1] integer
 
 -- 
 View this message in context: http://r.789695.n4.nabble.com/Failure-to-
 aggregate-tp2528613p2530963.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Dimitri Shvorob

 g = head(x)
 dput(g)
structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), 
size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, 
R, R, R, R, R), t = structure(list(sec = c(24.133, 
47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L, 
1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 
0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 
105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 
3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
)), .Names = c(sec, min, hour, mday, mon, year, 
wday, yday, isdst), class = c(POSIXt, POSIXlt)), 
h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 
0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 
9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 
105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 
3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
)), .Names = c(sec, min, hour, mday, mon, year, 
wday, yday, isdst), class = c(POSIXt, POSIXlt)), 
s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, 
0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt))), .Names = c(price, size, src, t, d, 
h, m, s), row.names = c(NA, 6L), class = data.frame)

 n = sqldf(select distinct h, src, count(*) from g group by h, src)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: g)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread jim holtman
I think your main problem is that you have your time as POSIXlt which
is a multiple valued vector.  I converted the 't' to POSIXct, removed
the other POSIXlt value and created a 'h' as the character for the
hour and it works fine:

 str(g)
'data.frame':   6 obs. of  5 variables:
 $ price: int  500 500 501 501 500 501
 $ size : int  221000 2000 1000 13000 3000 3000
 $ src  : chr  R R R R ...
 $ time : POSIXct, format: 2005-01-04 09:00:24 2005-01-04 09:00:47
2005-01-04 09:01:12 2005-01-04 09:01:18 ...
 $ h: chr  09 09 09 09 ...
 sqldf(select distinct h, src, count(*) from g group by h,src)
   h src count(*)
1 09   R6


On Thu, Sep 9, 2010 at 11:16 AM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 g = head(x)
 dput(g)
 structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L),
    size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R,
    R, R, R, R, R), t = structure(list(sec = c(24.133,
    47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L,
    1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0),
        min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L,
        0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
        0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
        105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
        3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
        )), .Names = c(sec, min, hour, mday, mon, year,
    wday, yday, isdst), class = c(POSIXt, POSIXlt)),
    h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L,
    0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0),
        min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L,
        9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
        0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
        105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
        3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
        )), .Names = c(sec, min, hour, mday, mon, year,
    wday, yday, isdst), class = c(POSIXt, POSIXlt)),
    s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L,
    0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt))), .Names = c(price, size, src, t, d,
 h, m, s), row.names = c(NA, 6L), class = data.frame)

 n = sqldf(select distinct h, src, count(*) from g group by h, src)
 Loading required package: tcltk
 Loading Tcl/Tk interface ... done
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such table: g)
 In addition: Warning message:
 In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Dimitri Shvorob

Thanks a lot, Jim. I am not sure what difference the various POSIXes make -
in the end, you are replacing a datetime hour with a numeric value, e.g., 1
or 9. That does not work for me, unfortunately. 

 g = head(x)
 dput(g)
structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), 
size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, 
R, R, R, R, R), t = structure(c(1104829224.133, 
1104829247.096, 1104829272.139, 1104829278.142, 1104829330.721, 
1104829348.713), class = c(POSIXt, POSIXct), tzone = ), 
d = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 
0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), h = c(9, 9, 9, 9, 9, 9), m = structure(list(
sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 0L, 1L, 1L, 2L, 
2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 4L, 
4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 
2L, 2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 
0L, 0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, 
mday, mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), s = structure(list(sec = c(24, 47, 12, 18, 10, 
28), min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 
9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 105L, 
105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 3L, 
3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(sec, 
min, hour, mday, mon, year, wday, yday, isdst
), class = c(POSIXt, POSIXlt))), .Names = c(price, 
size, src, t, d, h, m, s), row.names = c(NA, 6L
), class = data.frame)

 n = sqldf(select distinct h, src, count(*) from g group by h, src)
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: g)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'




-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Dimitri Shvorob

Update: What did make a difference for me - and something that was present in
Jim's example, but not reproduced by myself initially - was dropping columns
other than the two involved. When I dropped all columns except for h and
src, the sqldf call worked.

... Is it an R bug or what? (I am saying R, not sqldf, because built-in
tapply did not work either). Arrgh...

 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533206.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Gabor Grothendieck
On Thu, Sep 9, 2010 at 12:39 PM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 Update: What did make a difference for me - and something that was present in
 Jim's example, but not reproduced by myself initially - was dropping columns
 other than the two involved. When I dropped all columns except for h and
 src, the sqldf call worked.

 ... Is it an R bug or what? (I am saying R, not sqldf, because built-in
 tapply did not work either). Arrgh...


You need to show what you have done in a reproducible fashion to get
answers.  Converting to POSIXct does work.  g below is the object you
posted earlier in this thread.  I have also removed the erroneous
distinct although it will work even if you leave it in.

 g -
+ structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L),
+size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R,
+R, R, R, R, R), t = structure(list(sec = c(24.133,
+47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L,
+1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
+4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
+105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
+2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
+0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
+mon, year, wday, yday, isdst), class = c(POSIXt,
+POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0),
+min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L,
+0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
+0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
+105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
+3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
+)), .Names = c(sec, min, hour, mday, mon, year,
+wday, yday, isdst), class = c(POSIXt, POSIXlt)),
+h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L,
+0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
+4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
+105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
+2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
+0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
+mon, year, wday, yday, isdst), class = c(POSIXt,
+POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0),
+min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L,
+9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
+0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
+105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
+3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
+)), .Names = c(sec, min, hour, mday, mon, year,
+wday, yday, isdst), class = c(POSIXt, POSIXlt)),
+s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L,
+0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
+4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
+105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
+2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
+0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
+mon, year, wday, yday, isdst), class = c(POSIXt,
+POSIXlt))), .Names = c(price, size, src, t, d,
+ h, m, s), row.names = c(NA, 6L), class = data.frame)

 library(sqldf)

 # convert POSIXlt columns to POSIXct
 ix - sapply(g, inherits, POSIXlt)
 g[ix] - lapply(g[ix], as.POSIXct)

 n - sqldf(select h, src, count(*) from g group by h, src)
 n
h src count(*)
1 2005-01-04 09:00:00   R6




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-09 Thread Dimitri Shvorob

Ok, conversion to POSIXct does the trick - why doesn't tapply work with the
other, not-obviously-improper POSIX type? 

(Incidentally, now it gives me more trouble, with sorting - a reproducible
sample coming up in another thread).
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533311.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-08 Thread Dimitri Shvorob

Mnay thanks fr suggestions. I am afraid this is one tough daatframe...

 t = sqldf(select h, count(*) from x group by h)
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: x)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

 t = aggregate(x[price], by = x[h], FUN = NROW) 
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

 t = aggregate(x[price], by = x[h], FUN = length) 
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

 t = tapply(x$price, by = x$h, FUN = length) 
Error in is.list(INDEX) : 'INDEX' is missing

 class(x)
[1] data.frame
 class(x$h)
[1] POSIXt  POSIXlt
 class(x$price)
[1] integer

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2530963.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-08 Thread Dimitri Shvorob

I was able to aggregate (with sqldf, at least), after saving and re-loading
the dataframe. My first guess was that h (and/or price?) now being a factor
- stringsAsFactors = T by default - made the difference, and I tried to
convert x$h to factor, but received an error. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2530989.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Failure to aggregate

2010-09-06 Thread Dimitri Shvorob

I have a (very big - 1.5 rows) dataframe with a (POSIXt  POSIXlt) column h
(hour). Surprisingly, I cannot calculate a simple aggregate over the
dataframe.

 n.h1 = sqldf(select distinct h, count(*) from x group by h)
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: x)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

 n.h2 = aggregate(x$price, by = x$h, FUN = nrow)
Error in names(y) - c(names(by), names(x)) : 
  'names' attribute [10] must be the same length as the vector [2]

Arrgh...
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2528613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-06 Thread David Winsemius


On Sep 6, 2010, at 12:15 PM, Dimitri Shvorob wrote:



I have a (very big - 1.5 rows) dataframe with a (POSIXt  POSIXlt)  
column h

(hour). Surprisingly, I cannot calculate a simple aggregate over the
dataframe.


n.h1 = sqldf(select distinct h, count(*) from x group by h)

Error in sqliteExecStatement(con, statement, bind.data) :
 RS-DBI driver: (error in statement: no such table: x)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a  
'double'



n.h2 = aggregate(x$price, by = x$h, FUN = nrow)


A vector argument (x$price) would only have one row (at most).

nrow(c(1,2)
NULL


Error in names(y) - c(names(by), names(x)) :
 'names' attribute [10] must be the same length as the vector [2]


Try:

tapply(x$price, by = x$h, FUN = length)

--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Failure to aggregate

2010-09-06 Thread Gabor Grothendieck
On Mon, Sep 6, 2010 at 12:15 PM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 I have a (very big - 1.5 rows) dataframe with a (POSIXt  POSIXlt) column h
 (hour). Surprisingly, I cannot calculate a simple aggregate over the
 dataframe.

 n.h1 = sqldf(select distinct h, count(*) from x group by h)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such table: x)
 In addition: Warning message:
 In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

 n.h2 = aggregate(x$price, by = x$h, FUN = nrow)
 Error in names(y) - c(names(by), names(x)) :
  'names' attribute [10] must be the same length as the vector [2]


Since you are using group by you don't want distinct.

In aggregate use x[price] and x[h] rather than
x$price and x$h or use a formula.

Also use nrow or length in place of NROW.

library(sqldf)
x - data.frame(price = 1:4, h = c(1, 1, 2, 3))
sqldf(select h, count(*) from x group by h)

aggregate(x[price], by = x[h], FUN = NROW)
aggregate(x[price], by = x[h], FUN = length)
aggregate(price ~ h, x, FUN = length)







-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.