Le mardi 24 novembre 2015 à 19:32 -0800, Arin Basu a écrit :
> Thanks a million Milan and Dan. I have learned hugely from the codes
> you shared and the packages you discussed. There is a need for
> dedicated biostatistics packages in Julia. For instance, I could not
> find a dedicated package on regression diagnostics
I don't think this should live in a dedicated biostatistics package.
I'm a social scientist and I would find these tools useful too. Better
share our work.
> (I tried RegTools but it did not compile for some reason in my
> machine Mac OSX El Capitan, Julia 0.4.1).
That code is fairly new, it doesn't look like it's set up to be used as
a package yet. You could file an issue in GitHub against this package,
as it seems to be actively maintained, to tell the author people are
interested in testing it. Adding a src/RegTools.jl file containing
these lines:
include("diagnostics.jl")
include("misc.jl")
include("modsel.jl")
and including the functions in a module should be enough.
Regards
>
> Best,
> Arin
>
> On Monday, 23 November 2015 04:53:46 UTC+13, Milan Bouchet-Valat
> wrote:
> > As I noted just a few days ago, I have written a small package to
> > compute frequency tables from arbitrary arrays, with an optimized
> > method for pooled data arrays :
> > https://github.com/nalimilan/FreqTables.jl
> >
> > I've just pushed a fiw so it should now work on 0.4 (but not with
> > 0.3).
> >
> > We could easily add a method taking a DataFrame and symbol names
> > for
> > columns to save some typing.
> >
> >
> > Regards
> >
> > Le dimanche 22 novembre 2015 à 03:26 -0800, Dan a écrit :
> > > Hi Arin,
> > > It would be helpful to have more details about the input (a
> > > dataframe?) and output (a two-by-two table or a table indexed by
> > > categories?). Some code to give context to the question would be
> > even
> > > more help (possibly in another language, such as R).
> > >
> > > Having said this, here is a starting point for some code:
> > >
> > > If these packages are missing Pkg.add works:
> > >
> > > using NamedArrays
> > > using DataFrames
> > > using RDatasets
> > >
> > > Gets the dataset and makes some categorical variables in
> > DataFrames
> > > style:
> > >
> > > iris = dataset("datasets","iris")
> > > iris[:PetalWidth] = PooledDataArray(iris[:PetalWidth])
> > > iris[:Species] = PooledDataArray(iris[:Species])
> > >
> > > Define function for a `twobytwo` and a general categorical table
> > > `crosstable`:
> > >
> > > function twobytwo(data::DataFrame,cond1,cond2)
> > > nres=
> > >
> > NamedArray(zeros(Int,2,2),Any[[false,true],[false,true]],["cond1","
> > co
> > > nd2"])
> > > for i=1:nrow(data)
> > > nres[Int(cond1(data[i,:]))+1,Int(cond2(data[i,:]))+1]
> > += 1
> > > end
> > > nres
> > > end
> > >
> > > function crosstable(data::DataFrame,col1,col2)
> > > @assert isa(data[col1],PooledDataArray)
> > > @assert isa(data[col2],PooledDataArray)
> > > nres=
> > >
> > NamedArray(zeros(Int,length(data[col1].pool),length(data[col2].pool
> > ))
> > > ,Any[data[col1].pool,data[col2].pool],[col1,col2])
> > > for i=1:nrow(data)
> > > nres[data[col1].refs[i],data[col2].refs[i]] += 1
> > > end
> > > nres
> > > end
> > >
> > > Finally, using the functions, make some tables:
> > >
> > > tbt = twobytwo(iris,r->r[1,:Species]=="setosa",r
> > > ->r[1,:PetalWidth]>=1.5)
> > > ct = crosstable(iris,:PetalWidth,:Species)
> > >
> > > My summary and conclusions:
> > > 1) Julia is general purpose and with a little familiarity any
> > data
> > > handling is possible.
> > > 2) This is a basic data exploration operation and there must be
> > some
> > > easy way to do this.
> > >
> > > Waiting for more opinions/solutions on this question, as it is
> > also
> > > basic for my needs.
> > >
> > > Thanks for the question.
> > >
> > > On Sunday, November 22, 2015 at 3:34:56 AM UTC+2, Arin Basu
> > wrote:
> > > > Hi All,
> > > >
> > > > Can you kindly advise how to get a simple way to do two by two
> > > > tables in Julia with two categorical variables. I have tried
> > split
> > > > -apply-combine (by function) and it works with single
> > variables,
> > > > but with two or more variables, I cannot get the table I want.
> > > >
> > > > This is really an issue if we need to do statistical data
> > analysis
> > > > in Epidemiology.
> > > >
> > > > Any help or advice will be greatly appreciated.
> > > >
> > > > Arin Basu
> > > >