Hi Tim,

I liked your VectorLite idea---just collapse my ragged 2D array into a 1D 
SharedArray, and use additional 1D SharedArrays for offsets and lengths. 
Unfortunately, on the system where I need to deploy the code (a 32 core AMD 
server, 256 GB RAM), the amount of shared memory (/proc/sys/kernel/shmmax) 
is set to the default (~33MB) for Ubuntu. This isn't enough for me, but I 
can't increase it without root access. I think I'll give up on 
parallelizing my program, at least until multithreading support is added to 
Julia. Thanks much the help!

Best, Bob



On Saturday, May 10, 2014 3:18:55 AM UTC-7, Tim Holy wrote:
>
> I think it's safe to say that current infrastructure doesn't support this 
> out 
> of the box. Long-term, Tobias Knopp's experiment in multithreading (check 
> the 
> issue tracker) might be of interest. 
>
> Alternatively, define your own VectorLite type: 
>
> immutable VectorLite{T} 
>     offset::Int 
>     length::Int 
> end 
>
> with appropriate getindex() etc, then encode the data within a 
> SharedArray{Uint8,1} buffer and use reinterpret on an appropriate block: 
>
> n = 3_000_000 
> documents = reinterpret(VectorLite{Int}, buf[1:n*sizeof(VectorLite{Int})], 
> (n,)) 
>
> Here the "header" of buf encodes the offsets and lengths of each document 
> vector. 
>
> --Tim 
>
> On Friday, May 09, 2014 03:46:03 PM Bob Quazar wrote: 
> > Hi Tim -- I should have just said what data I'm trying to share between 
> > processes: It's basically a collection of documents, encoded as an 
> > Array{Array{Int64, 1}, 1}. Each array of integers corresponds to the 
> words 
> > in a document. The whole collection contains 3,000,000 documents. Each 
> > document contains a different number of words. Some have just a few 
> words, 
> > but others have hundreds of words, so I can't store the document 
> collection 
> > as an SharedArray{Int64, 2}. Once the collection of documents is loaded 
> > into memory, it never changes. From your previous posts, I take it I'm 
> out 
> > of luck, but I'd appreciate confirmation---rewriting the application in 
> C++ 
> > would be a major undertaking for me. 
> > 
> > Thanks much. 
> > 
> > On Thursday, May 8, 2014 7:49:09 AM UTC-7, Tim Holy wrote: 
> > > Given that there's a lot in your head about your application that I 
> don't 
> > > know, perhaps I should stop guessing about what you're trying to do 
> and 
> > > just 
> > > make sure the principles are clear: 
> > > 
> > > 1. The more data you have to serialize and send to other processes, 
> the 
> > > slower 
> > > it will be. The key thing about a SharedArray is that it uses a couple 
> > > tricks 
> > > to serialize the "container" without needing to serialize the 
> data---but 
> > > that's the only thing special about it. 
> > > 2. A type that contains non-immutable types (e.g., Arrays) only stores 
> > > references to those objects. So 
> > > type MyType{T} 
> > > 
> > >     A::SharedArray{T,2} 
> > >     b::SharedArray{T,1} 
> > > 
> > > end 
> > > should serialize very nicely, because a "MyType" object is, in itself, 
> > > tiny 
> > > (if you know C, think of it essentially as 2 pointers) and the big 
> parts 
> > > (the 
> > > SharedArrays) serialize nicely. 
> > > 
> > > I worry, though, that your array-of-tuples will prove troublesome, 
> because 
> > > each one of those is in turn a reference to an object which would need 
> to 
> > > be 
> > > serialized. If they are all of the same size and you need to serialize 
> it, 
> > > you 
> > > should consider a different structure (e.g., a SharedArray of 
> immutables). 
> > > 
> > > --Tim 
> > > 
> > > On Wednesday, May 07, 2014 12:52:59 PM Bob Quazar wrote: 
> > > > I don't need to modify any of the data, so I could use immutable. 
> Would 
> > > 
> > > you 
> > > 
> > > > recommend using tuples then, as fields in my immutable composite 
> type, 
> > > > rather than arrays? I tried that, but one of my arrays has about 
> > > 
> > > 3,000,000 
> > > 
> > > > entries (and each entry is itself a tuple); ntuple doesn't seem 
> designed 
> > > 
> > > to 
> > > 
> > > > handle this much data. 
> > > > 
> > > > w2 = ntuple(length(myArray), (j) -> myArray[j]) 
> > > > ERROR: stack overflow 
> > > > 
> > > >  in ntuple at tuple.jl:30 (repeats 74810 times) 
> > > >  in ntuple at tuple.jl:29 
> > > > 
> > > > Is there an alternative way to constructing immutable arrays? 
> > > > 
> > > > Thanks much, Bob 
> > > > 
> > > > On Wednesday, May 7, 2014 3:38:45 AM UTC-7, Tim Holy wrote: 
> > > > > Given your current design, reinterpret (which is indeed the same 
> thing 
> > > 
> > > as 
> > > 
> > > > > casting) is not going to work for you, as least not without a lot 
> of 
> > > > > hacking 
> > > > > on your part. Can you use an immutable? The key point is that an 
> > > 
> > > immutable 
> > > 
> > > > > has 
> > > > > a predictable layout in memory. For that to work, your arrays 
> would 
> > > 
> > > have 
> > > 
> > > > > to be 
> > > > > of fixed size. 
> > > > > 
> > > > > In the memory layout of immutables, one thing you have to be aware 
> of 
> > > 
> > > is 
> > > 
> > > > > that 
> > > > > sometimes "gaps" are introduced for the purpose of 
> memory-alignment. 
> > > > > (These 
> > > > > can also be platform-dependent, I believe, but if you're not 
> > > 
> > > transferring 
> > > 
> > > > > data 
> > > > > from one machine to another this shouldn't be a problem.) Here's a 
> > > 
> > > demo, 
> > > 
> > > > > in 
> > > > > case it's of any value to you: 
> > > > > 
> > > > > julia> immutable MyArray 
> > > > > 
> > > > >            x::Uint16 
> > > > >            y::Uint16 
> > > > >            z::Uint16 
> > > > >         
> > > > >        end 
> > > > > 
> > > > > julia> immutable MyType 
> > > > > 
> > > > >            a::MyArray 
> > > > >            c::Char 
> > > > >            d::Float64 
> > > > >         
> > > > >        end 
> > > > > 
> > > > > julia> sizeof(MyType) 
> > > > > 24 
> > > > > 
> > > > > julia> r = [uint8(1:24)] 
> > > > > 
> > > > > 24-element Array{Uint8,1}: 
> > > > >  0x01 
> > > > >  0x02 
> > > > >  0x03 
> > > > >  0x04 
> > > > >  0x05 
> > > > >  0x06 
> > > > >  0x07 
> > > > >  0x08 
> > > > >  0x09 
> > > > >  0x0a 
> > > > >   
> > > > >     ⋮ 
> > > > >   
> > > > >  0x0f 
> > > > >  0x10 
> > > > >  0x11 
> > > > >  0x12 
> > > > >  0x13 
> > > > >  0x14 
> > > > >  0x15 
> > > > >  0x16 
> > > > >  0x17 
> > > > >  0x18 
> > > > > 
> > > > > julia> reinterpret(MyType, r) 
> > >   
> > > > > 1-element Array{MyType,1}: 
> > > 
>  MyType(MyArray(0x0201,0x0403,0x0605),'\Uc0b0a09',1.2650169649295773e-192) 
> > >   
> > > > > On Tuesday, May 06, 2014 09:57:27 PM Bob Quazar wrote: 
> > > > > > Tim, Thanks for the response. 
> > > > > > 
> > > > > > How can I determine whether my structure (a composite type) is a 
> > > > > 
> > > > > contiguous 
> > > > > 
> > > > > > block of memory? The composite type has a few fields. One of the 
> > > 
> > > fields 
> > > 
> > > > > has 
> > > > > 
> > > > > > type Array{Array{Uint16,1},1} and another has type 
> > > > > > Array{Array{Float64,2},1}. Am I out of luck? 
> > > > > > 
> > > > > > What does it mean to reinterpret a composite type as a 
> > > > > > SharedArray{Uint8,1}? Is that something like casting? 
> > > > > > 
> > > > > > On Tuesday, May 6, 2014 7:32:47 PM UTC-7, Tim Holy wrote: 
> > > > > > > If your structure is a contiguous block of memory, you could 
> > > > > 
> > > > > presumably 
> > > > > 
> > > > > > > use a 
> > > > > > > SharedArray{Uint8,1} and then reinterpret it as your custom 
> > > 
> > > object. 
> > > 
> > > > > But if 
> > > > > 
> > > > > > > you 
> > > > > > > want independently garbage-collectable constituents, this is 
> not 
> > > > > 
> > > > > likely to 
> > > > > 
> > > > > > > be 
> > > > > > > feasible. 
> > > > > > > 
> > > > > > > --Tim 
> > > > > > > 
> > > > > > > On Tuesday, May 06, 2014 03:03:01 PM Bob Quazar wrote: 
> > > > > > > > In Julia, is it possible to share read-only data structures 
> > > 
> > > between 
> > > 
> > > > > > > > threads/processes running on different processors, without 
> > > 
> > > making a 
> > > 
> > > > > copy 
> > > > > 
> > > > > > > of 
> > > > > > > 
> > > > > > > > the entire data structure for each thread? The data 
> structure I 
> > > 
> > > need 
> > > 
> > > > > to 
> > > > > 
> > > > > > > > share isn't of "bit type", so I can't use a SharedArray. And 
> > > 
> > > each 
> > > 
> > > > > worker 
> > > > > 
> > > > > > > > needs to read random parts from the full data structure, so 
> I 
> > > 
> > > can't 
> > > 
> > > > > use 
> > > > > 
> > > > > > > a 
> > > > > > > 
> > > > > > > > distributed array. I also tried just putting "@parallel" 
> before 
> > > 
> > > a 
> > > 
> > > > > for 
> > > > > 
> > > > > > > loop, 
> > > > > > > 
> > > > > > > > whose body accesses the data structure I want to share, but 
> I 
> > > > > 
> > > > > believe 
> > > > > 
> > > > > > > the 
> > > > > > > 
> > > > > > > > whole data structure gets copied at each iteration. (I could 
> be 
> > > > > 
> > > > > wrong 
> > > > > 
> > > > > > > about 
> > > > > > > 
> > > > > > > > that, but it's certainly very slow.) Does Julia have some 
> > > > > 
> > > > > alternative to 
> > > > > 
> > > > > > > a 
> > > > > > > 
> > > > > > > > copy-on-write for sharing data between threads? 
> > > > > > > > 
> > > > > > > > Thanks in advance! 
> > > > > > > > 
> > > > > > > > --Bob 
>

Reply via email to