Hi Arin,
It would be helpful to have more details about the input (a dataframe?) and
output (a two-by-two table or a table indexed by categories?). Some code to
give context to the question would be even more help (possibly in another
language, such as R).
Having said this, here is a starting point for some code:
If these packages are missing Pkg.add works:
using NamedArrays
using DataFrames
using RDatasets
Gets the dataset and makes some categorical variables in DataFrames style:
iris = dataset("datasets","iris")
iris[:PetalWidth] = PooledDataArray(iris[:PetalWidth])
iris[:Species] = PooledDataArray(iris[:Species])
Define function for a `twobytwo` and a general categorical table
`crosstable`:
function twobytwo(data::DataFrame,cond1,cond2)
nres= NamedArray(zeros(Int,2,2),Any[[false,true],[false,true]],[
"cond1","cond2"])
for i=1:nrow(data)
nres[Int(cond1(data[i,:]))+1,Int(cond2(data[i,:]))+1] += 1
end
nres
end
function crosstable(data::DataFrame,col1,col2)
@assert isa(data[col1],PooledDataArray)
@assert isa(data[col2],PooledDataArray)
nres= NamedArray(zeros(Int,length(data[col1].pool),length(data[col2].
pool)),Any[data[col1].pool,data[col2].pool],[col1,col2])
for i=1:nrow(data)
nres[data[col1].refs[i],data[col2].refs[i]] += 1
end
nres
end
Finally, using the functions, make some tables:
tbt = twobytwo(iris,r->r[1,:Species]=="setosa",r->r[1,:PetalWidth]>=1.5)
ct = crosstable(iris,:PetalWidth,:Species)
My summary and conclusions:
1) Julia is general purpose and with a little familiarity any data handling
is possible.
2) This is a basic data exploration operation and there must be some easy
way to do this.
Waiting for more opinions/solutions on this question, as it is also basic
for my needs.
Thanks for the question.
On Sunday, November 22, 2015 at 3:34:56 AM UTC+2, Arin Basu wrote:
>
> Hi All,
>
> Can you kindly advise how to get a simple way to do two by two tables in
> Julia with two categorical variables. I have tried split-apply-combine (by
> function) and it works with single variables, but with two or more
> variables, I cannot get the table I want.
>
> This is really an issue if we need to do statistical data analysis in
> Epidemiology.
>
> Any help or advice will be greatly appreciated.
>
> Arin Basu
>