I think the best way to make this happen would be to implement a DDataArray type and then build a DDataFrame type on top of that.
— John On May 1, 2014, at 5:15 PM, Jason Rudy <[email protected]> wrote: > Hi Julia community, > > I'm curious about Julia and also am a serious user of multivariate adaptive > regression splines (MARS) in both R and Python. I'd very much like to have a > multiprocessing implementation of MARS and I'm looking into using Julia to > build one. I'm curious whether anyone has advice on what basic packages I > should look into as dependencies. It seems like if I want to do big parallel > computations I would need to use a distributed array. However, there are > some extra features (such as categorical variables and missing values) that > may already be implemented in a standard way in the DataFrames package. As I > see it, I can't use DataFrames if I want multiprocessing (unless I start by > copying the data into a distributed array), and therefore should just build > on distributed arrays and write an adapter later. Is my impression accurate, > and is there any other advice you might have for someone attempting what I'm > attempting with Julia? > > Best, > > Jason
