At the same time there are a lot of uses in R where the vector really is just an id, not something you would regress on. I often work on data frames with 10 million rows and 1 million ids, having that as a factor is just downright wasteful and slow.
I think the global option is the best compromise here. On Tue, Apr 17, 2012 at 11:37 AM, Joseph Voelkel <[email protected]> wrote: > Statistical work in R (what many us of do, I'd say) prefers factors over > characters: > > # factors > DF <- data.frame(X=letters[rep(1:5,2)], Y=rnorm(10)) > (DF.lm<-lm(Y~X,DF)) > predict(DF.lm) > # all works fine > > # characters > DF <- data.frame(X=letters[rep(1:5,2)], Y=rnorm(10), stringsAsFactors=FALSE) > (DF.lm<-lm(Y~X,DF)) # warning > predict(DF.lm) # warning > > Not sure if this one will get resolved. > > Using factors instead of characters also ensures that a table of months or > days of the week can be listed in the natural (not alphabetic) ordering. > > Joe > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Matthew Dowle > Sent: Sunday, April 15, 2012 1:20 PM > To: Damian Betebenner > Cc: [email protected] > Subject: Re: [datatable-help] Coercian to character > > I thought I'd added something to FAQ 2.17 about that, but seems not. > Will add, thanks. Maybe I only wrote it up in the comment when closing the > related feature request. It's deliberately different since my guess is that > most people most of the time (now) want characters left as characters and > keep setting stringsAsFactors to FALSE. Think the default for data.frame was > TRUE as a hang over from old versions of R before the global string cache was > added. > > It's not set in stone though so could be changed. In particular there could > be global default like we've done for other arguments so you could change the > default if need be. > > It won't cause a compatibility issue (same as other differences in faq > 2.17) or any issues down the road as far I can think, but let me know if you > think of anything. > > Matthew > > On Sun, 2012-04-15 at 04:40 -0500, Damian Betebenner wrote: >> I started having character vectors popping up in places I never had before >> but upon further investigation that turned out to be an issue with my own >> setup, not data.table. >> >> With regard to characters (and data.tables ability to handle them as a >> key now), I did notice that data.table and data.frame default to using >> stringsAsFactors differently: >> >> DF <- data.frame(X=letters[1:10], Y=rnorm(10)) sapply(DF, class) >> >> X Y >> "factor" "numeric" >> >> DT <- data.table(X=letters[1:10], Y=rnorm(10)) sapply(DT, class) >> >> > DT <- data.table(X=rep(letters[1:10], each=2), Y=rnorm(20)) >> > sapply(DT, class) >> X Y >> "character" "numeric" >> >> >> Will this inconsistency cause problems down the road? >> >> Thanks for all your help, >> >> Damian >> >> >> Damian Betebenner >> Center for Assessment >> PO Box 351 >> Dover, NH 03821-0351 >> >> Phone (office): (603) 516-7900 >> Phone (cell): (857) 234-2474 >> Fax: (603) 516-7910 >> >> [email protected] >> www.nciea.org >> >> >> >> >> -----Original Message----- >> From: Matthew Dowle [mailto:[email protected]] On Behalf >> Of Matthew Dowle >> Sent: Thursday, April 12, 2012 5:50 PM >> To: Damian Betebenner >> Cc: [email protected] >> Subject: Re: [datatable-help] Coercian to character >> >> It shouldn't coerce. What makes you think it does? >> >> > DT = data.table(a=factor(c("a","b","b","c")),b=1:4) >> > DT[,sum(b),by=a] >> a V1 >> [1,] a 1 >> [2,] b 5 >> [3,] c 4 >> > str(DT[,sum(b),by=a]) >> Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables: >> $ a : Factor w/ 3 levels "a","b","c": 1 2 3 $ V1: int 1 5 4 >> >> >> >> On Thu, 2012-04-12 at 14:57 -0500, Damian Betebenner wrote: >> > Data tablers >> > >> > >> > >> > Does data.table now coerce factors to character variables when doing >> > by summaries? >> > >> > >> > >> > If so, is there any way to not allow this coercion? >> > >> > >> > >> > Thanks, >> > >> > >> > >> > Damian Betebenner >> > >> > Center for Assessment >> > >> > PO Box 351 >> > >> > Dover, NH 03821-0351 >> > >> > >> > >> > Phone (office): (603) 516-7900 >> > >> > Phone (cell): (857) 234-2474 >> > >> > Fax: (603) 516-7910 >> > >> > >> > >> > [email protected] >> > >> > www.nciea.org >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > datatable-help mailing list >> > [email protected] >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatab >> > le >> > -help >> >> > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
