Not sure if it's as high-level as you're hoping for, but julia has great support for arrays that are much bigger than memory. See Mmap.mmap and SharedArray(filename, T, dims).
--Tim On Thursday, November 05, 2015 06:33:52 PM André Lage wrote: > hi Viral, > > Do you have any news on this? > > André Lage. > > On Wednesday, July 3, 2013 at 5:12:06 AM UTC-3, Viral Shah wrote: > > Hi all, > > > > I am cross-posting my reply to julia-stats and julia-users as there was a > > separate post on large logistic regressions on julia-users too. > > > > Just as these questions came up, Tanmay and I have been chatting about a > > general framework for working on problems that are too large to fit in > > memory, or need parallelism for performance. The idea is simple and based > > on providing a convenient and generic way to break up a problem into > > subproblems, each of which can then be scheduled to run anywhere. To start > > with, we will implement a map and mapreduce using this, and we hope that > > it > > should be able to handle large files sequentially, distributed data > > in-memory, and distributed filesystems within the same framework. Of > > course, this all sounds too good to be true. We are trying out a simple > > implementation, and if early results are promising, we can have a detailed > > discussion on API design and implementation. > > > > Doug, I would love to see if we can use some of this work to parallelize > > GLM at a higher level than using remotecall and fetch. > > > > -viral > > > > On Tuesday, July 2, 2013 11:10:35 PM UTC+5:30, Douglas Bates wrote: > >> On Tuesday, July 2, 2013 6:26:33 AM UTC-5, Raj DG wrote: > >>> Hi all, > >>> > >>> I am a regular user of R and also use it for handling very large data > >>> sets (~ 50 GB). We have enough RAM to fit all that data into memory for > >>> processing, so don't really need to do anything additional to chunk, > >>> etc. > >>> > >>> I wanted to get an idea of whether anyone has, in practice, performed > >>> analysis on large data sets using Julia. Use cases range from performing > >>> Cox Regression on ~ 40 million rows and over 10 independent variables to > >>> simple statistical analysis using T-Tests, etc. Also, how does the > >>> timings > >>> for operations like logistic regressions compare to Julia ? Are there > >>> any > >>> libraries/packages that can perform Cox, Poisson (Negative Binomial), > >>> and > >>> other regression types ? > >>> > >>> The benchmarks for Julia look promising, but in today's age of the "big > >>> data", it seems that the capability of handling large data is a > >>> pre-requisite to the future success of any new platform or language. > >>> Looking forward to your feedback, > >> > >> I think the potential for working with large data sets in Julia is better > >> than that in R. Among other things Julia allows for memory-mapped files > >> and for distributed arrays, both of which have great potential. > >> > >> I have been working with some Biostatisticians on a prototype package for > >> working with snp data of the sort generated in genome-wide association > >> studies. Current data sizes can be information on tens of thousands of > >> individuals (rows) for over a million snp positions (columns). The > >> nature > >> of the data is such that each position provides one of four potential > >> values, including a missing value. A compact storage format using 2 bits > >> per position is widely used for such data. We are able to read and > >> process > >> such a large array in a few seconds using memory-mapped files in Julia. > >> > >> The amazing thing is that the code is pure Julia. When I write in R I > >> am > >> > >> always conscious of the bottlenecks and the need to write C or C++ code > >> for > >> those places. I haven't encountered cases where I need to write new code > >> in a compiled language to speed up a Julia function. I have interfaced > >> to > >> existing numerical libraries but not writing fresh code. > >> > >> As John mentioned I have written the GLM package allowing for hooks to > >> use distributed arrays. As yet I haven't had a large enough problem to > >> warrant fleshing out those hooks but I could be persuaded.
