Le mardi 19 juillet 2016 à 10:58 -0700, Douglas Bates a écrit : > > > It seems to me that contrasts should be defined in defined in the > > array packages and not in DataFrames. We'd probably need the > > functions to be defined in an upstream package like StatsBase or > > (ArrayBase/DataBase?) such that all array packages can extend them. > > > > That's the approach that makes the most sense to me too. Right now > CategoricalArrays only requires Compat and it does not seem that > Milan is available to make changes in it. > > > We have the usual problem of optional dependencies. Should > > DataFrames depend on any data array package or all of them? Is it > > possible the DataFrames doesn't use any features of concrete data > > array types and only define methods for abstract types? Then the > > user would have to load a specific array package. This might be a > > bit demanding to keep working and from a user perspective, a single > > good implementation might be better. > > > > What are the specific issues you are having right now? Are the > > things that are broken things that used to work or is work in > > progress towards using Nullable and Categorical arrays? > > > > I was trying to use CategoricalArrays and failing. This only affects > PooledDataArrays and CategoricalArrays but there are other aspects > like the termnames methods, whose generic is currently defined in > DataFrames, but is linked to the contrasts. > > Ultimately if PooledDataArray is replaced by CategoricalArray then > these generics can all go into CategoricalArrays. It would be > necessary to have DataFrames require CategoricalArrays but I suspect > that would happen anyway. > > In a way I would like to split the > Formula/Terms/ModelFrame/ModelMatrix material into a separate package > but that package would need to depend on DataFrames so it wouldn't > buy us much. The CategoricalArrays port won't be ready in time for the Julia 0.5 release, so we need to get a DataFrames version based on DataArrays to work anyway. Do whatever improvements you think are needed, and then I'll port them to CategoricalArrays in a later step.
Regarding dependencies, I agree we should move the model frame/matrix methods to StatsBase (or a standalone package), and import it from DataFrames to define actual methods. That will allow packages to support the formula interface without adding a dependency on DataFrames, and will allow experimenting with other implementations like TypedTables. Regards > > On Tue, Jul 19, 2016 at 12:23 PM, Douglas Bates <[email protected]> > > wrote: > > > Yes, thanks to Tony, Andreas, Milan and others who worked on > > > this. > > > > > > At the risk of making myself unpopular I would like to return to > > > the issue of ModelFrame, ModelMatrix, etc. because a lot of code > > > is still broken for me. At present `DataFrames/REQUIRE` lists > > > `DataArrays 0,3.4` but neither `NullableArrays` nor > > > `CategoricalArrays`. Contrasts are defined in > > > `DataFrames/src/statsmodels/formula..jl` but we would need to > > > require `CategoricalArrays` if contrasts for that type were to be > > > defined there. To me it would make more sense to define the > > > contrasts where the array types are defined. > > > > > > I can add `CategoricalArrays` to `DataFrames/REQUIRE` to get > > > ModelMatrix working again but that might have a knock-on effect > > > for many packages that require `DataFrames`. > > > > > > Although I'd really like to get ModelMatrix working again, I > > > don't want to make changes like DataFrames requiring > > > CategoricalArrays that later need to be backed out. > > > > > -- You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
