No, it's for taking an "array" that's already in a file and sharing it among workers via mmap.
Docs would be great. I might contribute them, but anyone else is welcome to do so. --Tim On Wednesday, November 11, 2015 04:31:58 PM John Brock wrote: > Thanks, Tomas, but I wasn't referring to the keyword arguments. One > signature starts with a filename argument, and the other doesn't. What's > the difference? Is the filename specifying the location at which to create > a memory mapped file? > > On Wednesday, November 11, 2015 at 3:33:56 AM UTC-8, Tomas Lycken wrote: > > Everything after the semicolon is keyword arguments, and will dispatch to > > the same method as if they are left out. Thus, the documentation for > > SharedArray(T, dims; init=false, pids=[]) is valid for SharedArray(T, > > dims) too, and the values of init and pids will be the ones given in the > > signature. > > > > // T > > > > On Monday, November 9, 2015 at 9:43:17 PM UTC+1, John Brock wrote: > > > > It looks like SharedArray(filename, T, dims) isn't documented, > > > >> but SharedArray(T, dims; init=false, pids=Int[]) is. What's the > >> difference? > >> > >> On Friday, November 6, 2015 at 2:21:01 AM UTC-8, Tim Holy wrote: > >>> Not sure if it's as high-level as you're hoping for, but julia has great > >>> support for arrays that are much bigger than memory. See Mmap.mmap and > >>> SharedArray(filename, T, dims). > >>> > >>> --Tim > >>> > >>> On Thursday, November 05, 2015 06:33:52 PM André Lage wrote: > >>> > hi Viral, > >>> > > >>> > Do you have any news on this? > >>> > > >>> > André Lage. > >>> > > >>> > On Wednesday, July 3, 2013 at 5:12:06 AM UTC-3, Viral Shah wrote: > >>> > > Hi all, > >>> > > > >>> > > I am cross-posting my reply to julia-stats and julia-users as there > >>> > >>> was a > >>> > >>> > > separate post on large logistic regressions on julia-users too. > >>> > > > >>> > > Just as these questions came up, Tanmay and I have been chatting > >>> > >>> about a > >>> > >>> > > general framework for working on problems that are too large to fit > >>> > >>> in > >>> > >>> > > memory, or need parallelism for performance. The idea is simple and > >>> > >>> based > >>> > >>> > > on providing a convenient and generic way to break up a problem into > >>> > > subproblems, each of which can then be scheduled to run anywhere. To > >>> > >>> start > >>> > >>> > > with, we will implement a map and mapreduce using this, and we hope > >>> > >>> that > >>> > >>> > > it > >>> > > should be able to handle large files sequentially, distributed data > >>> > > in-memory, and distributed filesystems within the same framework. Of > >>> > > course, this all sounds too good to be true. We are trying out a > >>> > >>> simple > >>> > >>> > > implementation, and if early results are promising, we can have a > >>> > >>> detailed > >>> > >>> > > discussion on API design and implementation. > >>> > > > >>> > > Doug, I would love to see if we can use some of this work to > >>> > >>> parallelize > >>> > >>> > > GLM at a higher level than using remotecall and fetch. > >>> > > > >>> > > -viral > >>> > > > >>> > > On Tuesday, July 2, 2013 11:10:35 PM UTC+5:30, Douglas Bates wrote: > >>> > >> On Tuesday, July 2, 2013 6:26:33 AM UTC-5, Raj DG wrote: > >>> > >>> Hi all, > >>> > >>> > >>> > >>> I am a regular user of R and also use it for handling very large > >>> > >>> data > >>> > >>> > >>> sets (~ 50 GB). We have enough RAM to fit all that data into > >>> > >>> memory for > >>> > >>> > >>> processing, so don't really need to do anything additional to > >>> > >>> chunk, > >>> > >>> > >>> etc. > >>> > >>> > >>> > >>> I wanted to get an idea of whether anyone has, in practice, > >>> > >>> performed > >>> > >>> > >>> analysis on large data sets using Julia. Use cases range from > >>> > >>> performing > >>> > >>> > >>> Cox Regression on ~ 40 million rows and over 10 independent > >>> > >>> variables to > >>> > >>> > >>> simple statistical analysis using T-Tests, etc. Also, how does the > >>> > >>> timings > >>> > >>> for operations like logistic regressions compare to Julia ? Are > >>> > >>> there > >>> > >>> > >>> any > >>> > >>> libraries/packages that can perform Cox, Poisson (Negative > >>> > >>> Binomial), > >>> > >>> > >>> and > >>> > >>> other regression types ? > >>> > >>> > >>> > >>> The benchmarks for Julia look promising, but in today's age of the > >>> > >>> "big > >>> > >>> > >>> data", it seems that the capability of handling large data is a > >>> > >>> pre-requisite to the future success of any new platform or > >>> > >>> language. > >>> > >>> > >>> Looking forward to your feedback, > >>> > >> > >>> > >> I think the potential for working with large data sets in Julia is > >>> > >>> better > >>> > >>> > >> than that in R. Among other things Julia allows for memory-mapped > >>> > >>> files > >>> > >>> > >> and for distributed arrays, both of which have great potential. > >>> > >> > >>> > >> I have been working with some Biostatisticians on a prototype > >>> > >>> package for > >>> > >>> > >> working with snp data of the sort generated in genome-wide > >>> > >>> association > >>> > >>> > >> studies. Current data sizes can be information on tens of > >>> > >>> thousands of > >>> > >>> > >> individuals (rows) for over a million snp positions (columns). The > >>> > >> nature > >>> > >> of the data is such that each position provides one of four > >>> > >>> potential > >>> > >>> > >> values, including a missing value. A compact storage format using > >>> > >>> 2 bits > >>> > >>> > >> per position is widely used for such data. We are able to read and > >>> > >> process > >>> > >> such a large array in a few seconds using memory-mapped files in > >>> > >>> Julia. > >>> > >>> > >> The amazing thing is that the code is pure Julia. When I write in > >>> > >>> R I > >>> > >>> > >> am > >>> > >> > >>> > >> always conscious of the bottlenecks and the need to write C or C++ > >>> > >>> code > >>> > >>> > >> for > >>> > >> those places. I haven't encountered cases where I need to write > >>> > >>> new code > >>> > >>> > >> in a compiled language to speed up a Julia function. I have > >>> > >>> interfaced > >>> > >>> > >> to > >>> > >> existing numerical libraries but not writing fresh code. > >>> > >> > >>> > >> As John mentioned I have written the GLM package allowing for hooks > >>> > >>> to > >>> > >>> > >> use distributed arrays. As yet I haven't had a large enough > >>> > >>> problem to > >>> > >>> > >> warrant fleshing out those hooks but I could be persuaded. > >>> > >>>
