See also the discussion at https://github.com/JuliaLang/julia/issues/8470. 
Best, David

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Milan Bouchet-Valat
> Sent: Friday, September 30, 2016 12:32 AM
> To: [email protected]
> Subject: Re: [julia-stats] DataFrame and Memory Limitations
> 
> Le jeudi 29 septembre 2016 à 18:03 -0400, Tom Breloff a écrit :
> > I remember Stefan talking about a built-in "record" type on the
> > horizon (like named tuples, but core to the language).  Does anyone
> > know about progress there?
> I think that's https://github.com/JuliaLang/julia/pull/16580
> 
> 
> Regards
> 
> 
> > On Thu, Sep 29, 2016 at 5:59 PM, David Anthoff <[email protected]>
> > wrote:
> > > Yes, at least in theory it should be possible to e.g. load a very
> > > large CSV file with CSV.jl, transform it with Query.jl and then feed
> > > it into OnlineStats.jl. I think the architecture of all three
> > > packages should be such that this could work with a dataset that is
> > > larger than memory. In practice I don't think anyone has tried and
> > > I'm sure we would run into things that need fixing, but I can't
> > > think of some basic design decision in any of these packages that
> > > would prevent this kind of thing in principle.
> > >
> > > There is a general question of the core interop type for these
> > > things. Right now things like regression packages mostly expect a
> > > DataFrame. But we could imagine a world where these packages
> > > expected a more generic type. I think right now there are a bunch of
> > > potential options out there: both DataStreams and Query define their
> > > own streaming interfaces for tabular data (in the case of Query it
> > > is just a normal julia iterator that returns NamedTuple elements).
> > > DataStreams in addition defines a column based interface that might
> > > be much faster when the dataset actually fits into memory (pure
> > > speculation on my end). I think there are also a bunch of attempts
> > > out there to define something like an abstract table structure, but
> > > I'm not sure to what extend they would enable a streaming data
> > > story.
> > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:julia-stats@googlegrou
> > > ps.com]
> > > > On Behalf Of Milan Bouchet-Valat
> > > > Sent: Thursday, September 29, 2016 1:33 AM
> > > > To: [email protected]
> > > > Subject: Re: [julia-stats] DataFrame and Memory Limitations
> > > >
> > > > We're not completely there yet, but with Query.jl and
> > > StructuredQueries.jl,
> > > > combined with JuliaDB/JuliaData packages, one should be able to
> > > work on
> > > > out-of-memory data sets as (or more) efficiently as e.g. SAS. The
> > > high-level
> > > > API is the same whether you work on a DataFrame or on an external
> > > data
> > > > base.
> > > >
> > > > There's also OnlineStats.jl for computing statistics without
> > > loading the full
> > > > data set in memory at once.
> > > >
> > > >
> > > > Regards
> > > >
> > > >
> > > > Le mercredi 28 septembre 2016 à 15:48 -0700, Juan a écrit :
> > > > > Yes, but you can only do simple things such as summaries or use
> > > functions
> > > > implemented on that special packages. You can do linear
> > > regression, till now
> > > > but you can't  more complex things such as mixed effect
> > > regression or use
> > > > stan nor any other generic bayesian package.
> > > > > The same goes for Spark, you can only use predefined functions,
> > > very
> > > > simple ones, or create your own by hand, but it's very difficult
> > > that you can
> > > > program from scratch something like lme4.
> > > > >
> > > > > > > > Hi I don't know Julia, but in R you don't need to load
> > > all data
> > > > into  memory just like SAS you can read off disk, in R both
> > > proprietary
> > > > Revolutionary Analytics R I think working with
> > > Hortonworks/Cloudera and
> > > > Hadoop and Yarn (I don't know if there is a Julia package for
> > > Yarn?, I know
> > > > little of Hadoop  and [not really interested in Java ] and Yarn
> > > so I suggest you
> > > > contact someone at Hortonworks or Revolution R) g  which I saw a
> > > > demonstration of in R User group here in Ottawa, Canada as well
> > > as
> > > > Revolution R's other proprietary methods  and bigmemory  http://c
> > > ran.r-
> > > > project.org/web/packages/bigmemory/index.html
> > > > and http://www.bigmemory.org/ can handle more data. I Here is a
> > > > discussion on large size data.
> > > > > > https://groups.google.com/forum/#!topic/julia-stats/eqYT85_vU
> > > lg
> > > > > > Regards,
> > > > > > Ramesh
> > > > > >
> > > > > >
> > > > > > > > On Tue, Aug 5, 2014 at 10:42 AM, Michael Smith <my.r...@g
> > > mail.com>
> > > > wrote:
> > > > > > > All,
> > > > > > >
> > > > > > > Are there currently any solutions in Julia to handle
> > > > > > > larger-than-memory datasets in a similar way you do in a
> > > DataFrame?
> > > > > > >
> > > > > > > The reason I'm asking is that R has the limitation that you
> > > need
> > > > > > > to fit all your data into memory. On the other hand, SAS
> > > (while
> > > > > > > being quite
> > > > > > > different) does not have this limitations.
> > > > > > >
> > > > > > > In the age of "big data" this can be quite an advantage.
> > > > > > >
> > > > > > > Of course, you can "patch" this situation, e.g. in R you
> > > can use
> > > > > > > the ff or bigmemory packages, or use SQL.
> > > > > > >
> > > > > > > But my point is that it is bolted on, and you need to spend
> > > extra
> > > > > > > mental loops switching between, say, data.frame and ff,
> > > instead of
> > > > > > > focusing on your data problem at hand. This is a clear
> > > advantage
> > > > > > > of SAS, where you don't have to do that. So I'm wondering
> > > how this is
> > > > handled in Julia.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > M
> > > > > > >
> > > > > > > P.S.: I do not intend to start a flame war, e.g. whether R
> > > or SAS
> > > > > > > or Julia is better. I'm just interested to find out whether
> > > such a
> > > > > > > solution exists in Julia (I haven't found any, but maybe I
> > > overlooked
> > > > something).
> > > > > > > And if no such solution exists, given that Julia is still
> > > young,
> > > > > > > evolving, and malleable (in a positive sense), it might
> > > make sense
> > > > > > > to think about it.
> > > > > > >
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the
> > > Google
> > > > Groups "julia-stats" group.
> > > > > > > > > > To unsubscribe from this group and stop receiving
> > > emails from it,
> > > > send an email to [email protected].
> > > > > > > > > > For more options, visit https://groups.google.com/d/o
> > > ptout.
> > > > > > >
> > > > > >
> > > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the
> > > Google
> > > > Groups "julia-stats" group.
> > > > > > To unsubscribe from this group and stop receiving emails from
> > > it, send an
> > > > email to [email protected].
> > > > > > For more options, visit https://groups.google.com/d/optout.
> > > >
> > > > --
> > > > You received this message because you are subscribed to the
> > > Google Groups
> > > > "julia-stats" group.
> > > > To unsubscribe from this group and stop receiving emails from it,
> > > send an
> > > > email to [email protected].
> > > > For more options, visit https://groups.google.com/d/optout.
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "julia-stats" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to [email protected].
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "julia-stats" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> 
> --
> You received this message because you are subscribed to the Google Groups
> "julia-stats" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to