All,

I was wondering how setkey orders a factor and whether it observes whether the 
factor is ordered or just alphabetically orders the factor 

I would like to have the key observe the order of a factor (e.g., a course 
taken field may run from 1 to 5 with 1=Basic Math, 2=Calculus, 3=Geometry,
4=Algebra I and 5=Algebra 2. I would like the sort imposed by data.table to 
"respect" the canonical ordering of the classes, no an alphabetical ordering.

I can't however, seem to get the key to behave the way I want.

Here's an example:

setkey(123)
my.course.sample <- sample(1:5, 10, replace=TRUE)

X <- 1:10
Y <- factor(my.course.sample, levels=1:5, labels=c("Basic Math", "Calculus", 
"Geometry", "Algebra I", "Algebra II"))

my.dt <- data.table(ID=X, COURSE=Y)

> my.dt
      ID     COURSE
 [1,]  1 Algebra II
 [2,]  2  Algebra I
 [3,]  3  Algebra I
 [4,]  4 Algebra II
 [5,]  5   Geometry
 [6,]  6  Algebra I
 [7,]  7   Geometry
 [8,]  8   Calculus
 [9,]  9  Algebra I
[10,] 10   Geometry


setkey(my.dt, COURSE)

> my.dt
      ID     COURSE
 [1,]  2  Algebra I
 [2,]  3  Algebra I
 [3,]  6  Algebra I
 [4,]  9  Algebra I
 [5,]  1 Algebra II
 [6,]  4 Algebra II
 [7,]  8   Calculus
 [8,]  5   Geometry
 [9,]  7   Geometry
[10,] 10   Geometry


###
### The COURSE key is alphabetizing based upon the labels
###

###
### Now try to impose a different ordering
###

Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", 
"Calculus", "Geometry", "Algebra I", "Algebra II"))

my.dt <- data.table(ID=X, COURSE=Y)

> my.dt
      ID     COURSE
 [1,]  1  Algebra I
 [2,]  2   Calculus
 [3,]  3   Calculus
 [4,]  4  Algebra I
 [5,]  5   Geometry
 [6,]  6   Calculus
 [7,]  7   Geometry
 [8,]  8 Algebra II
 [9,]  9   Calculus
[10,] 10   Geometry

setkey(my.dt, COURSE)

> my.dt
      ID     COURSE
 [1,]  1  Algebra I
 [2,]  3  Algebra I
 [3,]  9  Algebra I
 [4,]  2 Algebra II
 [5,]  4 Algebra II
 [6,]  8 Algebra II
 [7,]  7 Basic Math
 [8,]  5   Calculus
 [9,]  6   Calculus
[10,] 10   Geometry


Y <- factor(my.course.sample, levels=c(1,4,3,5,2), labels=c("Basic Math", 
"Calculus", "Geometry", "Algebra I", "Algebra II"), ordered=TRUE)

my.dt <- data.table(ID=X, COURSE=Y)

my.dt

      ID     COURSE
 [1,]  1  Algebra I
 [2,]  2   Calculus
 [3,]  3   Calculus
 [4,]  4  Algebra I
 [5,]  5   Geometry
 [6,]  6   Calculus
 [7,]  7   Geometry
 [8,]  8 Algebra II
 [9,]  9   Calculus
[10,] 10   Geometry

setkey(my.dt, COURSE)

my.dt

      ID     COURSE
 [1,]  1  Algebra I
 [2,]  4  Algebra I
 [3,]  8 Algebra II
 [4,]  2   Calculus
 [5,]  3   Calculus
 [6,]  6   Calculus
 [7,]  9   Calculus
 [8,]  5   Geometry
 [9,]  7   Geometry
[10,] 10   Geometry


### Setting COURSE as the key for an ordered factor seems to over-ride the 
ordering associated with the factor and impose an alphabetical order.


I'd like the key to respect the order associated with the factor


Any help with this greatly appreciated.


Best regards,



Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH   03821-0351
 
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910

[email protected]
www.nciea.org


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to