Statistical work in R (what many us of do, I'd say) prefers factors over 
characters:

# factors
DF <- data.frame(X=letters[rep(1:5,2)], Y=rnorm(10))
(DF.lm<-lm(Y~X,DF))
predict(DF.lm)
# all works fine

# characters
DF <- data.frame(X=letters[rep(1:5,2)], Y=rnorm(10), stringsAsFactors=FALSE)
(DF.lm<-lm(Y~X,DF)) # warning
predict(DF.lm) # warning

Not sure if this one will get resolved. 

Using factors instead of characters also ensures that a table of months or days 
of the week can be listed in the natural (not alphabetic) ordering.

Joe

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Matthew Dowle
Sent: Sunday, April 15, 2012 1:20 PM
To: Damian Betebenner
Cc: [email protected]
Subject: Re: [datatable-help] Coercian to character

I thought I'd added something to FAQ 2.17 about that, but seems not.
Will add, thanks. Maybe I only wrote it up in the comment when closing the 
related feature request. It's deliberately different since my guess is that 
most people most of the time (now) want characters left as characters and keep 
setting stringsAsFactors to FALSE. Think the default for data.frame was TRUE as 
a hang over from old versions of R before the global string cache was added.

It's not set in stone though so could be changed. In particular there could be 
global default like we've done for other arguments so you could change the 
default if need be.

It won't cause a compatibility issue (same as other differences in faq
2.17) or any issues down the road as far I can think, but let me know if you 
think of anything.

Matthew

On Sun, 2012-04-15 at 04:40 -0500, Damian Betebenner wrote:
> I started having character vectors popping up in places I never had before 
> but upon further investigation that turned out to be an issue with my own 
> setup, not data.table.
> 
> With regard to characters (and data.tables ability to handle them as a 
> key now), I did notice that data.table and data.frame default to using 
> stringsAsFactors differently:
> 
> DF <- data.frame(X=letters[1:10], Y=rnorm(10)) sapply(DF, class)
> 
>         X         Y 
>  "factor" "numeric"
> 
> DT <- data.table(X=letters[1:10], Y=rnorm(10)) sapply(DT, class)
> 
> > DT <- data.table(X=rep(letters[1:10], each=2), Y=rnorm(20)) 
> > sapply(DT, class)
>           X           Y 
> "character"   "numeric"
> 
> 
> Will this inconsistency cause problems down the road?
> 
> Thanks for all your help,
> 
> Damian
> 
> 
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>  
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
> 
> [email protected]
> www.nciea.org
> 
> 
> 
> 
> -----Original Message-----
> From: Matthew Dowle [mailto:[email protected]] On Behalf 
> Of Matthew Dowle
> Sent: Thursday, April 12, 2012 5:50 PM
> To: Damian Betebenner
> Cc: [email protected]
> Subject: Re: [datatable-help] Coercian to character
> 
> It shouldn't coerce. What makes you think it does?
> 
> > DT = data.table(a=factor(c("a","b","b","c")),b=1:4)
> > DT[,sum(b),by=a]
>      a V1
> [1,] a  1
> [2,] b  5
> [3,] c  4
> > str(DT[,sum(b),by=a])
> Classes ‘data.table’ and 'data.frame':        3 obs. of  2 variables:
>  $ a : Factor w/ 3 levels "a","b","c": 1 2 3  $ V1: int  1 5 4
> 
> 
> 
> On Thu, 2012-04-12 at 14:57 -0500, Damian Betebenner wrote:
> > Data tablers
> > 
> >  
> > 
> > Does data.table now coerce factors to character variables when doing 
> > by summaries?
> > 
> >  
> > 
> > If so, is there any way to not allow this coercion?
> > 
> >  
> > 
> > Thanks,
> > 
> >  
> > 
> > Damian Betebenner
> > 
> > Center for Assessment
> > 
> > PO Box 351
> > 
> > Dover, NH   03821-0351
> > 
> >  
> > 
> > Phone (office): (603) 516-7900
> > 
> > Phone (cell): (857) 234-2474
> > 
> > Fax: (603) 516-7910
> > 
> >  
> > 
> > [email protected]
> > 
> > www.nciea.org
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > [email protected]
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatab
> > le
> > -help
> 
> 


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to