Le jeudi 29 septembre 2016 à 18:03 -0400, Tom Breloff a écrit : > I remember Stefan talking about a built-in "record" type on the > horizon (like named tuples, but core to the language). Does anyone > know about progress there? I think that's https://github.com/JuliaLang/julia/pull/16580
Regards > On Thu, Sep 29, 2016 at 5:59 PM, David Anthoff <[email protected]> > wrote: > > Yes, at least in theory it should be possible to e.g. load a very > > large CSV file with CSV.jl, transform it with Query.jl and then > > feed it into OnlineStats.jl. I think the architecture of all three > > packages should be such that this could work with a dataset that is > > larger than memory. In practice I don't think anyone has tried and > > I'm sure we would run into things that need fixing, but I can't > > think of some basic design decision in any of these packages that > > would prevent this kind of thing in principle. > > > > There is a general question of the core interop type for these > > things. Right now things like regression packages mostly expect a > > DataFrame. But we could imagine a world where these packages > > expected a more generic type. I think right now there are a bunch > > of potential options out there: both DataStreams and Query define > > their own streaming interfaces for tabular data (in the case of > > Query it is just a normal julia iterator that returns NamedTuple > > elements). DataStreams in addition defines a column based interface > > that might be much faster when the dataset actually fits into > > memory (pure speculation on my end). I think there are also a bunch > > of attempts out there to define something like an abstract table > > structure, but I'm not sure to what extend they would enable a > > streaming data story. > > > > > -----Original Message----- > > > From: [email protected] [mailto:julia-stats@googlegrou > > ps.com] > > > On Behalf Of Milan Bouchet-Valat > > > Sent: Thursday, September 29, 2016 1:33 AM > > > To: [email protected] > > > Subject: Re: [julia-stats] DataFrame and Memory Limitations > > > > > > We're not completely there yet, but with Query.jl and > > StructuredQueries.jl, > > > combined with JuliaDB/JuliaData packages, one should be able to > > work on > > > out-of-memory data sets as (or more) efficiently as e.g. SAS. The > > high-level > > > API is the same whether you work on a DataFrame or on an external > > data > > > base. > > > > > > There's also OnlineStats.jl for computing statistics without > > loading the full > > > data set in memory at once. > > > > > > > > > Regards > > > > > > > > > Le mercredi 28 septembre 2016 à 15:48 -0700, Juan a écrit : > > > > Yes, but you can only do simple things such as summaries or use > > functions > > > implemented on that special packages. You can do linear > > regression, till now > > > but you can't more complex things such as mixed effect > > regression or use > > > stan nor any other generic bayesian package. > > > > The same goes for Spark, you can only use predefined functions, > > very > > > simple ones, or create your own by hand, but it's very difficult > > that you can > > > program from scratch something like lme4. > > > > > > > > > > > Hi I don't know Julia, but in R you don't need to load > > all data > > > into memory just like SAS you can read off disk, in R both > > proprietary > > > Revolutionary Analytics R I think working with > > Hortonworks/Cloudera and > > > Hadoop and Yarn (I don't know if there is a Julia package for > > Yarn?, I know > > > little of Hadoop and [not really interested in Java ] and Yarn > > so I suggest you > > > contact someone at Hortonworks or Revolution R) g which I saw a > > > demonstration of in R User group here in Ottawa, Canada as well > > as > > > Revolution R's other proprietary methods and bigmemory http://c > > ran.r- > > > project.org/web/packages/bigmemory/index.html > > > and http://www.bigmemory.org/ can handle more data. I Here is a > > > discussion on large size data. > > > > > https://groups.google.com/forum/#!topic/julia-stats/eqYT85_vU > > lg > > > > > Regards, > > > > > Ramesh > > > > > > > > > > > > > > > > > On Tue, Aug 5, 2014 at 10:42 AM, Michael Smith <my.r...@g > > mail.com> > > > wrote: > > > > > > All, > > > > > > > > > > > > Are there currently any solutions in Julia to handle > > > > > > larger-than-memory datasets in a similar way you do in a > > DataFrame? > > > > > > > > > > > > The reason I'm asking is that R has the limitation that you > > need > > > > > > to fit all your data into memory. On the other hand, SAS > > (while > > > > > > being quite > > > > > > different) does not have this limitations. > > > > > > > > > > > > In the age of "big data" this can be quite an advantage. > > > > > > > > > > > > Of course, you can "patch" this situation, e.g. in R you > > can use > > > > > > the ff or bigmemory packages, or use SQL. > > > > > > > > > > > > But my point is that it is bolted on, and you need to spend > > extra > > > > > > mental loops switching between, say, data.frame and ff, > > instead of > > > > > > focusing on your data problem at hand. This is a clear > > advantage > > > > > > of SAS, where you don't have to do that. So I'm wondering > > how this is > > > handled in Julia. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > M > > > > > > > > > > > > P.S.: I do not intend to start a flame war, e.g. whether R > > or SAS > > > > > > or Julia is better. I'm just interested to find out whether > > such a > > > > > > solution exists in Julia (I haven't found any, but maybe I > > overlooked > > > something). > > > > > > And if no such solution exists, given that Julia is still > > young, > > > > > > evolving, and malleable (in a positive sense), it might > > make sense > > > > > > to think about it. > > > > > > > > > > > > -- > > > > > > You received this message because you are subscribed to the > > Google > > > Groups "julia-stats" group. > > > > > > > > > To unsubscribe from this group and stop receiving > > emails from it, > > > send an email to [email protected]. > > > > > > > > > For more options, visit https://groups.google.com/d/o > > ptout. > > > > > > > > > > > > > > > > > > > > -- > > > > You received this message because you are subscribed to the > > Google > > > Groups "julia-stats" group. > > > > > To unsubscribe from this group and stop receiving emails from > > it, send an > > > email to [email protected]. > > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > You received this message because you are subscribed to the > > Google Groups > > > "julia-stats" group. > > > To unsubscribe from this group and stop receiving emails from it, > > send an > > > email to [email protected]. > > > For more options, visit https://groups.google.com/d/optout. > > > > -- > > You received this message because you are subscribed to the Google > > Groups "julia-stats" group. > > To unsubscribe from this group and stop receiving emails from it, > > send an email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > > > -- > You received this message because you are subscribed to the Google > Groups "julia-stats" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
