Le mardi 19 juillet 2016 à 10:58 -0700, Douglas Bates a écrit :
> 
> > It seems to me that contrasts should be defined in defined in the
> > array packages and not in DataFrames. We'd probably need the
> > functions to be defined in an upstream package like StatsBase or
> > (ArrayBase/DataBase?) such that all array packages can extend them.
> > 
> 
> That's the approach that makes the most sense to me too.  Right now
> CategoricalArrays only requires Compat and it does not seem that
> Milan is available to make changes in it.
> 
> > We have the usual problem of optional dependencies. Should
> > DataFrames depend on any data array package or all of them? Is it
> > possible the DataFrames doesn't use any features of concrete data
> > array types and only define methods for abstract types? Then the
> > user would have to load a specific array package. This might be a
> > bit demanding to keep working and from a user perspective, a single
> > good implementation might be better.
> > 
> > What are the specific issues you are having right now? Are the
> > things that are broken things that used to work or is work in
> > progress towards using Nullable and Categorical arrays?
> > 
> 
> I was trying to use CategoricalArrays and failing.  This only affects
> PooledDataArrays and CategoricalArrays but there are other aspects
> like the termnames methods, whose generic is currently defined in
> DataFrames, but is linked to the contrasts.
> 
> Ultimately if PooledDataArray is replaced by CategoricalArray then
> these generics can all go into CategoricalArrays.  It would be
> necessary to have DataFrames require CategoricalArrays but I suspect
> that would happen anyway.
> 
> In a way I would like to split the
> Formula/Terms/ModelFrame/ModelMatrix material into a separate package
> but that package would need to depend on DataFrames so it wouldn't
> buy us much.
The CategoricalArrays port won't be ready in time for the Julia 0.5
release, so we need to get a DataFrames version based on DataArrays to
work anyway. Do whatever improvements you think are needed, and then
I'll port them to CategoricalArrays in a later step.

Regarding dependencies, I agree we should move the model frame/matrix
methods to StatsBase (or a standalone package), and import it from
DataFrames to define actual methods. That will allow packages to
support the formula interface without adding a dependency on
DataFrames, and will allow experimenting with other implementations
like TypedTables.



Regards


> > On Tue, Jul 19, 2016 at 12:23 PM, Douglas Bates <[email protected]>
> > wrote:
> > > Yes, thanks to Tony, Andreas, Milan and others who worked on
> > > this.
> > > 
> > > At the risk of making myself unpopular I would like to return to
> > > the issue of ModelFrame, ModelMatrix, etc. because a lot of code
> > > is still broken for me.  At present `DataFrames/REQUIRE` lists
> > > `DataArrays 0,3.4` but neither `NullableArrays` nor
> > > `CategoricalArrays`.  Contrasts are defined in
> > >  `DataFrames/src/statsmodels/formula..jl` but we would need to
> > > require `CategoricalArrays` if contrasts for that type were to be
> > > defined there.  To me it would make more sense to define the
> > > contrasts where the array types are defined.
> > > 
> > > I can add `CategoricalArrays` to `DataFrames/REQUIRE` to get
> > > ModelMatrix working again but that might have a knock-on effect
> > > for many packages that require `DataFrames`.
> > > 
> > > Although I'd really like to get ModelMatrix working again, I
> > > don't want to make changes like DataFrames requiring
> > > CategoricalArrays that later need to be backed out.
> > > 
> > 

-- 
You received this message because you are subscribed to the Google Groups 
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to