Hi
I've been using data.table only 1 week. Some of the idioms still have
me befuddled. I'm finding some things that work, and I don't
understand why. That's bad, but more frustrating, I have usages that
seem good, but fail. I manage the aggregation on subsets parts more
easily than the seemingly easier chores where I want to treat this as
a data frame.
In this example, accounts is a huge data.table with 41,000 lines, I'm
pasting in a little dump of a few lines below, if it gives you enough
to test.
## This works:
library(gmodels)
temp1 <- quote(east2a)
temp2 <- quote(east2b.1)
accounts[ , CrossTable(eval(temp1), eval(temp2), expected = FALSE,
prop.chisq = FALSE, prop.c = FALSE)]
## why does this not work?
i <- "east2a"
temp1 <- quote(paste(i))
temp2 <- quote(paste0("east2b.", 1))
accounts[ , CrossTable(eval(temp1), eval(temp2), expected = FALSE,
prop.chisq = FALSE, prop.c = FALSE)]
Here's what happens when I run that:
> library(gmodels)
> temp1 <- quote(east2a)
> temp2 <- quote(east2b.1)
> accounts[ , CrossTable(eval(temp1), eval(temp2), expected = FALSE,
prop.chisq = FALSE, prop.c = FALSE)]
Cell Contents
|-------------------------|
| N |
| N / Row Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 41052
| east2b.1
east2a | Yes | No | Row Total |
-------------|-----------|-----------|-----------|
Yes | 9703 | 3051 | 12754 |
| 0.761 | 0.239 | 0.311 |
| 0.236 | 0.074 | |
-------------|-----------|-----------|-----------|
No | 7957 | 20341 | 28298 |
| 0.281 | 0.719 | 0.689 |
| 0.194 | 0.495 | |
-------------|-----------|-----------|-----------|
Column Total | 17660 | 23392 | 41052 |
-------------|-----------|-----------|-----------|
t prop.row prop.col prop.tbl
1: 9703 0.7607809 0.5494337 0.23635876
2: 7957 0.2811859 0.4505663 0.19382734
3: 3051 0.2392191 0.1304292 0.07432037
4: 20341 0.7188141 0.8695708 0.49549352
> i <- "east2a"
> temp1 <- quote(paste(i))
> temp2 <- quote(paste0("east2b.", 1))
> eval(temp2)
[1] "east2b.1"
> accounts[ , CrossTable(eval(temp1), eval(temp2), expected = FALSE,
prop.chisq = FALSE, prop.c = FALSE)]
Error in chisq.test(t, correct = FALSE) :
'x' must at least have 2 elements
From the output of dput here, can you reconstruct accounts for
demonstration? I could drop it in a website/
> dput(accounts[1:4, .SD])
structure(list(sippid = c("019003754630:0203", "019003754630:0204",
"019052074737:0101", "019052074737:0102"), east2bFirst = c("99",
"11.0", "04.0", "04.0"), east2b = structure(c(2L, 1L, 1L, 1L), .Label
= c("Yes",
"No"), class = "factor"), tage = c(19L, 25L, 30L, 29L), east2aFirst =
c("99",
"99", "03.0", "03.0"), east2a = structure(c(2L, 2L, 1L, 1L), .Label =
c("Yes",
"No"), class = "factor"), east2b.1 = structure(c(2L, 2L, 2L,
2L), .Label = c("Yes", "No"), class = "factor"), tage.1 = c(19L,
22L, 30L, 28L), east3bFirst = c("99", "99", "03.0", "03.0"),
east3b = structure(c(2L, 2L, 1L, 1L), .Label = c("Yes", "No"
), class = "factor"), east2b.2 = structure(c(2L, 2L, 2L,
2L), .Label = c("Yes", "No"), class = "factor"), tage.2 = c(19L,
22L, 30L, 28L), east1aFirst = c("99", "99", "11.0", "11.0"
), east1a = structure(c(2L, 2L, 1L, 1L), .Label = c("Yes",
"No"), class = "factor"), east2b.3 = structure(c(2L, 2L,
1L, 1L), .Label = c("Yes", "No"), class = "factor"), tage.3 = c(19L,
22L, 32L, 31L), east3aFirst = c("99", "99", "11.0", "11.0"
), east3a = structure(c(2L, 2L, 1L, 1L), .Label = c("Yes",
"No"), class = "factor"), east2b.4 = structure(c(2L, 2L,
1L, 1L), .Label = c("Yes", "No"), class = "factor"), tage.4 = c(19L,
22L, 32L, 31L), east2cFirst = c("99", "99", "99", "99"),
east2c = structure(c(2L, 2L, 2L, 2L), .Label = c("Yes", "No"
), class = "factor"), east2b.5 = structure(c(2L, 2L, 2L,
2L), .Label = c("Yes", "No"), class = "factor"), tage.5 = c(19L,
22L, 29L, 28L), east1bcFirst = c("99", "99", "08.0", "09.0"
), east1bc = structure(c(2L, 2L, 1L, 1L), .Label = c("Yes",
"No"), class = "factor"), east2b.6 = structure(c(2L, 2L,
1L, 1L), .Label = c("Yes", "No"), class = "factor"), tage.6 = c(19L,
22L, 31L, 30L), east2dFirst = c("99", "99", "99", "99"),
east2d = structure(c(2L, 2L, 2L, 2L), .Label = c("Yes", "No"
), class = "factor"), east2b.7 = structure(c(2L, 2L, 2L,
2L), .Label = c("Yes", "No"), class = "factor"), tage.7 = c(19L,
22L, 29L, 28L)), .Names = c("sippid", "east2bFirst", "east2b",
"tage", "east2aFirst", "east2a", "east2b.1", "tage.1", "east3bFirst",
"east3b", "east2b.2", "tage.2", "east1aFirst", "east1a", "east2b.3",
"tage.3", "east3aFirst", "east3a", "east2b.4", "tage.4", "east2cFirst",
"east2c", "east2b.5", "tage.5", "east1bcFirst", "east1bc", "east2b.6",
"tage.6", "east2dFirst", "east2d", "east2b.7", "tage.7"), sorted =
"sippid", class = c("data.table",
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer:
0x1df8f58>)
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help