Regarding fasttime: my understanding is that only works after 1970. On Mon, Feb 25, 2013 at 7:41 PM, < [email protected]> wrote:
> Send datatable-help mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of datatable-help digest..." > > > Today's Topics: > > 1. About adding fastmatch and fasttime to data.table (stat quant) > 2. Potential bug with sorting/summarizing by POSIXct and logical > column (Victor Kryukov) > 3. Re: About adding fastmatch and fasttime to data.table > (Matthew Dowle) > 4. Re: Potential bug with sorting/summarizing by POSIXct and > logical column (Michael Nelson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 25 Feb 2013 19:40:35 +0100 > From: stat quant <[email protected]> > To: [email protected] > Subject: [datatable-help] About adding fastmatch and fasttime to > data.table > Message-ID: > < > cajjhha9ql8hurxf0+8onpad1t7y5csoolx7qdknuqxc1xpm...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hello list, > > Looking at fastmatch and fasttime, I realized that those package consists > solely in 1 C file (each). > We spoke about the possibility to add those to data.table, I tried to > contact S.Urbanek without any success so I do not have feedback from his > side. > Using fastPOSIXct provide a huge gain when one have to load files with > datetime, on my laptop using data.table:::fread, I realized that most of > the time is spent casting datetimes to POSIXct (I have several columns). > > Looking at fasttime, you can see pretty good improvement (factor 15) > > R) ts <- as.character(.POSIXct(runif(1e6) * unclass(Sys.time()))) > R) system.time(a <- as.POSIXct(ts, "GMT")) > utilisateur syst?me ?coul? > 6.49 0.04 6.57 > R) system.time(b <- fastPOSIXct(ts, "GMT")) > utilisateur syst?me ?coul? > 0.40 0.00 0.41 > > When colClasses will be implemented in fread, can I suggest to allow using > fasttime as an option ? > Concerning fastmatch, the vignette already shows some nice benchmarks, I > tend to do a lot of selects based on string columns, not sure if this is > the case for most of us. > > My 0.002 cent > Cheers > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/f45e5d57/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 25 Feb 2013 14:26:28 -0800 > From: Victor Kryukov <[email protected]> > To: [email protected] > Subject: [datatable-help] Potential bug with sorting/summarizing by > POSIXct and logical column > Message-ID: > <CANJmMqTdpKGL3Bq=y-fYCsWDc8uTe3h-g+VoGBV= > [email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > > I've encounted what looks like a bug while sorting by POSIXct and logical > column, which may or may not be related to the following bug: > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 > > Here are all the details: > > http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns > > Here is the test case: > > # First some data > data <- data.table(structure(list( > month = structure(c(1356998400, 1356998400, 1356998400, > 1359676800, 1354320000, 1359676800, 1359676800, > 1356998400, 1356998400, > 1354320000, 1354320000, 1354320000, 1359676800, > 1359676800, 1359676800, > 1356998400, 1359676800, 1359676800, 1356998400, > 1359676800, 1359676800, > 1359676800, 1359676800, 1354320000, 1354320000), > class = c("POSIXct", > > "POSIXt"), tzone = "UTC"), > portal = c(TRUE, TRUE, FALSE, TRUE, > TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, > FALSE, > TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, > TRUE, TRUE > ), > satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, > 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, > 9L, 10L, 9L, > 10L, 10L)), > .Names = c("month", "portal", "satisfaction"), > row.names = c(NA, -25L), class = "data.frame")) > > # Summarizing by month, portal with tapply works: > > > tapply(data$satisfaction, list(data$month, data$portal), mean) > FALSE TRUE > 2012-12-01 8.5 8.000000 > 2013-01-01 10.0 10.000000 > 2013-02-01 9.0 9.545455 > > # Summarizing with 'by' argument of data.table does not: > > > data[, mean(satisfaction), by = 'month,portal']> > data[, mean(satisfaction), by = list(month, portal)] > month portal V1 > 1: 2013-01-01 FALSE 10.000000 > 2: 2013-02-01 TRUE 9.000000 > 3: 2013-01-01 TRUE 10.000000 > 4: 2012-12-01 FALSE 8.500000 > 5: 2012-12-01 TRUE 7.333333 > 6: 2013-02-01 TRUE 9.666667 > 7: 2013-02-01 FALSE 9.000000 > 8: 2012-12-01 TRUE 10.000000 > > # Summarizing only this year's data works: > data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] > month portal V1 > 1: 2013-01-01 TRUE 10.000000 > 2: 2013-01-01 FALSE 10.000000 > 3: 2013-02-01 TRUE 9.545455 > 4: 2013-02-01 FALSE 9.000000 > > Yours Sincerely, > Victor Kryukov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130225/45b99e3e/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Tue, 26 Feb 2013 00:39:09 +0000 > From: Matthew Dowle <[email protected]> > To: <[email protected]> > Cc: [email protected] > Subject: Re: [datatable-help] About adding fastmatch and fasttime to > data.table > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > > > Hi, > > This sounds like a geat idea. I don't know why Simon U didn't > reply, or without success, so that may depend on the way you asked, > whether he is on holiday at the moment, his reaction to the precise > wording of the email you wrote, or some other factor. It is difficult to > tell! But we don't need to wait for him or for for you: this is open > source. You have got much further than I have so if you'd like to add > this please go ahead and make progress. You're very welcome to join the > project and commit directly. Or if you can't for some reason please file > as a feature request so it doesn't get forgotten. > > Matthew > > On > 25.02.2013 18:40, stat quant wrote: > > > Hello list, > > > > Looking at > fastmatch and fasttime, I realized that those package consists solely in > 1 C file (each). > > We spoke about the possibility to add those to > data.table, I tried to contact S.Urbanek without any success so I do not > have feedback from his side. > > Using fastPOSIXct provide a huge gain > when one have to load files with datetime, on my laptop using > data.table:::fread, I realized that most of the time is spent casting > datetimes to POSIXct (I have several columns). > > > > Looking at > fasttime, you can see pretty good improvement (factor 15) > > > > R) ts R) > system.time(a utilisateur syst?me ?coul? > > 6.49 0.04 6.57 > > R) > system.time(b utilisateur syst?me ?coul? > > 0.40 0.00 0.41 > > > > When > colClasses will be implemented in fread, can I suggest to allow using > fasttime as an option ? > > Concerning fastmatch, the vignette already > shows some nice benchmarks, I tend to do a lot of selects based on > string columns, not sure if this is the case for most of us. > > > > My > 0.002 cent > > Cheers > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/643480c3/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Tue, 26 Feb 2013 00:40:02 +0000 > From: Michael Nelson <[email protected]> > To: "[email protected]" > <[email protected]> > Subject: Re: [datatable-help] Potential bug with sorting/summarizing > by POSIXct and logical column > Message-ID: > < > 6fb5193a6cdcdf499486a833b7afbdcd5827d...@ex-mbx-pro-04.mcs.usyd.edu.au> > > Content-Type: text/plain; charset="iso-8859-1" > > I can't replicate this problem using data.table 1.8.7 (installed about 3 > weeks ago) on > R version 2.15.2 (2012-10-26) > Platform: i386-w64-mingw32/i386 (32-bit) > > Michael > ________________________________ > From: [email protected] [ > [email protected]] on behalf of Victor > Kryukov [[email protected]] > Sent: Tuesday, 26 February 2013 9:26 AM > To: [email protected] > Subject: [datatable-help] Potential bug with sorting/summarizing by > POSIXct and logical column > > Hello, > > I've encounted what looks like a bug while sorting by POSIXct and logical > column, which may or may not be related to the following bug: > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975 > > Here are all the details: > http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns > > Here is the test case: > > # First some data > data <- data.table(structure(list( > month = structure(c(1356998400, 1356998400, 1356998400, > 1359676800, 1354320000, 1359676800, 1359676800, > 1356998400, 1356998400, > 1354320000, 1354320000, 1354320000, 1359676800, > 1359676800, 1359676800, > 1356998400, 1359676800, 1359676800, 1356998400, > 1359676800, 1359676800, > 1359676800, 1359676800, 1354320000, 1354320000), > class = c("POSIXct", > > "POSIXt"), tzone = "UTC"), > portal = c(TRUE, TRUE, FALSE, TRUE, > TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, > FALSE, > TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, > TRUE, TRUE > ), > satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, > 9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, > 9L, 10L, 9L, > 10L, 10L)), > .Names = c("month", "portal", "satisfaction"), > row.names = c(NA, -25L), class = "data.frame")) > > # Summarizing by month, portal with tapply works: > > > tapply(data$satisfaction, list(data$month, data$portal), mean) > FALSE TRUE > 2012-12-01 8.5 8.000000 > 2013-01-01 10.0 10.000000 > 2013-02-01 9.0 9.545455 > > # Summarizing with 'by' argument of data.table does not: > > > data[, mean(satisfaction), by = 'month,portal']> > data[, mean(satisfaction), by = list(month, portal)] > month portal V1 > 1: 2013-01-01 FALSE 10.000000 > 2: 2013-02-01 TRUE 9.000000 > 3: 2013-01-01 TRUE 10.000000 > 4: 2012-12-01 FALSE 8.500000 > 5: 2012-12-01 TRUE 7.333333 > 6: 2013-02-01 TRUE 9.666667 > 7: 2013-02-01 FALSE 9.000000 > 8: 2012-12-01 TRUE 10.000000 > > # Summarizing only this year's data works: > data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal'] > month portal V1 > 1: 2013-01-01 TRUE 10.000000 > 2: 2013-01-01 FALSE 10.000000 > 3: 2013-02-01 TRUE 9.545455 > 4: 2013-02-01 FALSE 9.000000 > > Yours Sincerely, > Victor Kryukov > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment.html > > > > ------------------------------ > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > End of datatable-help Digest, Vol 36, Issue 8 > ********************************************* >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
