Thanks, Tomas, but I wasn't referring to the keyword arguments. One signature starts with a filename argument, and the other doesn't. What's the difference? Is the filename specifying the location at which to create a memory mapped file?
On Wednesday, November 11, 2015 at 3:33:56 AM UTC-8, Tomas Lycken wrote: > > Everything after the semicolon is keyword arguments, and will dispatch to > the same method as if they are left out. Thus, the documentation for > SharedArray(T, > dims; init=false, pids=[]) is valid for SharedArray(T, dims) too, and the > values of init and pids will be the ones given in the signature. > > // T > > On Monday, November 9, 2015 at 9:43:17 PM UTC+1, John Brock wrote: > > It looks like SharedArray(filename, T, dims) isn't documented, >> but SharedArray(T, dims; init=false, pids=Int[]) is. What's the difference? >> >> On Friday, November 6, 2015 at 2:21:01 AM UTC-8, Tim Holy wrote: >>> >>> Not sure if it's as high-level as you're hoping for, but julia has great >>> support for arrays that are much bigger than memory. See Mmap.mmap and >>> SharedArray(filename, T, dims). >>> >>> --Tim >>> >>> On Thursday, November 05, 2015 06:33:52 PM André Lage wrote: >>> > hi Viral, >>> > >>> > Do you have any news on this? >>> > >>> > André Lage. >>> > >>> > On Wednesday, July 3, 2013 at 5:12:06 AM UTC-3, Viral Shah wrote: >>> > > Hi all, >>> > > >>> > > I am cross-posting my reply to julia-stats and julia-users as there >>> was a >>> > > separate post on large logistic regressions on julia-users too. >>> > > >>> > > Just as these questions came up, Tanmay and I have been chatting >>> about a >>> > > general framework for working on problems that are too large to fit >>> in >>> > > memory, or need parallelism for performance. The idea is simple and >>> based >>> > > on providing a convenient and generic way to break up a problem into >>> > > subproblems, each of which can then be scheduled to run anywhere. To >>> start >>> > > with, we will implement a map and mapreduce using this, and we hope >>> that >>> > > it >>> > > should be able to handle large files sequentially, distributed data >>> > > in-memory, and distributed filesystems within the same framework. Of >>> > > course, this all sounds too good to be true. We are trying out a >>> simple >>> > > implementation, and if early results are promising, we can have a >>> detailed >>> > > discussion on API design and implementation. >>> > > >>> > > Doug, I would love to see if we can use some of this work to >>> parallelize >>> > > GLM at a higher level than using remotecall and fetch. >>> > > >>> > > -viral >>> > > >>> > > On Tuesday, July 2, 2013 11:10:35 PM UTC+5:30, Douglas Bates wrote: >>> > >> On Tuesday, July 2, 2013 6:26:33 AM UTC-5, Raj DG wrote: >>> > >>> Hi all, >>> > >>> >>> > >>> I am a regular user of R and also use it for handling very large >>> data >>> > >>> sets (~ 50 GB). We have enough RAM to fit all that data into >>> memory for >>> > >>> processing, so don't really need to do anything additional to >>> chunk, >>> > >>> etc. >>> > >>> >>> > >>> I wanted to get an idea of whether anyone has, in practice, >>> performed >>> > >>> analysis on large data sets using Julia. Use cases range from >>> performing >>> > >>> Cox Regression on ~ 40 million rows and over 10 independent >>> variables to >>> > >>> simple statistical analysis using T-Tests, etc. Also, how does the >>> > >>> timings >>> > >>> for operations like logistic regressions compare to Julia ? Are >>> there >>> > >>> any >>> > >>> libraries/packages that can perform Cox, Poisson (Negative >>> Binomial), >>> > >>> and >>> > >>> other regression types ? >>> > >>> >>> > >>> The benchmarks for Julia look promising, but in today's age of the >>> "big >>> > >>> data", it seems that the capability of handling large data is a >>> > >>> pre-requisite to the future success of any new platform or >>> language. >>> > >>> Looking forward to your feedback, >>> > >> >>> > >> I think the potential for working with large data sets in Julia is >>> better >>> > >> than that in R. Among other things Julia allows for memory-mapped >>> files >>> > >> and for distributed arrays, both of which have great potential. >>> > >> >>> > >> I have been working with some Biostatisticians on a prototype >>> package for >>> > >> working with snp data of the sort generated in genome-wide >>> association >>> > >> studies. Current data sizes can be information on tens of >>> thousands of >>> > >> individuals (rows) for over a million snp positions (columns). The >>> > >> nature >>> > >> of the data is such that each position provides one of four >>> potential >>> > >> values, including a missing value. A compact storage format using >>> 2 bits >>> > >> per position is widely used for such data. We are able to read and >>> > >> process >>> > >> such a large array in a few seconds using memory-mapped files in >>> Julia. >>> > >> >>> > >> The amazing thing is that the code is pure Julia. When I write in >>> R I >>> > >> am >>> > >> >>> > >> always conscious of the bottlenecks and the need to write C or C++ >>> code >>> > >> for >>> > >> those places. I haven't encountered cases where I need to write >>> new code >>> > >> in a compiled language to speed up a Julia function. I have >>> interfaced >>> > >> to >>> > >> existing numerical libraries but not writing fresh code. >>> > >> >>> > >> As John mentioned I have written the GLM package allowing for hooks >>> to >>> > >> use distributed arrays. As yet I haven't had a large enough >>> problem to >>> > >> warrant fleshing out those hooks but I could be persuaded. >>> >>> >
