As I noted just a few days ago, I have written a small package to
compute frequency tables from arbitrary arrays, with an optimized
method for pooled data arrays :
https://github.com/nalimilan/FreqTables.jl
I've just pushed a fiw so it should now work on 0.4 (but not with 0.3).
We could easily add a method taking a DataFrame and symbol names for
columns to save some typing.
Regards
Le dimanche 22 novembre 2015 à 03:26 -0800, Dan a écrit :
> Hi Arin,
> It would be helpful to have more details about the input (a
> dataframe?) and output (a two-by-two table or a table indexed by
> categories?). Some code to give context to the question would be even
> more help (possibly in another language, such as R).
>
> Having said this, here is a starting point for some code:
>
> If these packages are missing Pkg.add works:
>
> using NamedArrays
> using DataFrames
> using RDatasets
>
> Gets the dataset and makes some categorical variables in DataFrames
> style:
>
> iris = dataset("datasets","iris")
> iris[:PetalWidth] = PooledDataArray(iris[:PetalWidth])
> iris[:Species] = PooledDataArray(iris[:Species])
>
> Define function for a `twobytwo` and a general categorical table
> `crosstable`:
>
> function twobytwo(data::DataFrame,cond1,cond2)
> nres=
> NamedArray(zeros(Int,2,2),Any[[false,true],[false,true]],["cond1","co
> nd2"])
> for i=1:nrow(data)
> nres[Int(cond1(data[i,:]))+1,Int(cond2(data[i,:]))+1] += 1
> end
> nres
> end
>
> function crosstable(data::DataFrame,col1,col2)
> @assert isa(data[col1],PooledDataArray)
> @assert isa(data[col2],PooledDataArray)
> nres=
> NamedArray(zeros(Int,length(data[col1].pool),length(data[col2].pool))
> ,Any[data[col1].pool,data[col2].pool],[col1,col2])
> for i=1:nrow(data)
> nres[data[col1].refs[i],data[col2].refs[i]] += 1
> end
> nres
> end
>
> Finally, using the functions, make some tables:
>
> tbt = twobytwo(iris,r->r[1,:Species]=="setosa",r
> ->r[1,:PetalWidth]>=1.5)
> ct = crosstable(iris,:PetalWidth,:Species)
>
> My summary and conclusions:
> 1) Julia is general purpose and with a little familiarity any data
> handling is possible.
> 2) This is a basic data exploration operation and there must be some
> easy way to do this.
>
> Waiting for more opinions/solutions on this question, as it is also
> basic for my needs.
>
> Thanks for the question.
>
> On Sunday, November 22, 2015 at 3:34:56 AM UTC+2, Arin Basu wrote:
> > Hi All,
> >
> > Can you kindly advise how to get a simple way to do two by two
> > tables in Julia with two categorical variables. I have tried split
> > -apply-combine (by function) and it works with single variables,
> > but with two or more variables, I cannot get the table I want.
> >
> > This is really an issue if we need to do statistical data analysis
> > in Epidemiology.
> >
> > Any help or advice will be greatly appreciated.
> >
> > Arin Basu
> >