I think it's safe to say that current infrastructure doesn't support this out
of the box. Long-term, Tobias Knopp's experiment in multithreading (check the
issue tracker) might be of interest.
Alternatively, define your own VectorLite type:
immutable VectorLite{T}
offset::Int
length::Int
end
with appropriate getindex() etc, then encode the data within a
SharedArray{Uint8,1} buffer and use reinterpret on an appropriate block:
n = 3_000_000
documents = reinterpret(VectorLite{Int}, buf[1:n*sizeof(VectorLite{Int})],
(n,))
Here the "header" of buf encodes the offsets and lengths of each document
vector.
--Tim
On Friday, May 09, 2014 03:46:03 PM Bob Quazar wrote:
> Hi Tim -- I should have just said what data I'm trying to share between
> processes: It's basically a collection of documents, encoded as an
> Array{Array{Int64, 1}, 1}. Each array of integers corresponds to the words
> in a document. The whole collection contains 3,000,000 documents. Each
> document contains a different number of words. Some have just a few words,
> but others have hundreds of words, so I can't store the document collection
> as an SharedArray{Int64, 2}. Once the collection of documents is loaded
> into memory, it never changes. From your previous posts, I take it I'm out
> of luck, but I'd appreciate confirmation---rewriting the application in C++
> would be a major undertaking for me.
>
> Thanks much.
>
> On Thursday, May 8, 2014 7:49:09 AM UTC-7, Tim Holy wrote:
> > Given that there's a lot in your head about your application that I don't
> > know, perhaps I should stop guessing about what you're trying to do and
> > just
> > make sure the principles are clear:
> >
> > 1. The more data you have to serialize and send to other processes, the
> > slower
> > it will be. The key thing about a SharedArray is that it uses a couple
> > tricks
> > to serialize the "container" without needing to serialize the data---but
> > that's the only thing special about it.
> > 2. A type that contains non-immutable types (e.g., Arrays) only stores
> > references to those objects. So
> > type MyType{T}
> >
> > A::SharedArray{T,2}
> > b::SharedArray{T,1}
> >
> > end
> > should serialize very nicely, because a "MyType" object is, in itself,
> > tiny
> > (if you know C, think of it essentially as 2 pointers) and the big parts
> > (the
> > SharedArrays) serialize nicely.
> >
> > I worry, though, that your array-of-tuples will prove troublesome, because
> > each one of those is in turn a reference to an object which would need to
> > be
> > serialized. If they are all of the same size and you need to serialize it,
> > you
> > should consider a different structure (e.g., a SharedArray of immutables).
> >
> > --Tim
> >
> > On Wednesday, May 07, 2014 12:52:59 PM Bob Quazar wrote:
> > > I don't need to modify any of the data, so I could use immutable. Would
> >
> > you
> >
> > > recommend using tuples then, as fields in my immutable composite type,
> > > rather than arrays? I tried that, but one of my arrays has about
> >
> > 3,000,000
> >
> > > entries (and each entry is itself a tuple); ntuple doesn't seem designed
> >
> > to
> >
> > > handle this much data.
> > >
> > > w2 = ntuple(length(myArray), (j) -> myArray[j])
> > > ERROR: stack overflow
> > >
> > > in ntuple at tuple.jl:30 (repeats 74810 times)
> > > in ntuple at tuple.jl:29
> > >
> > > Is there an alternative way to constructing immutable arrays?
> > >
> > > Thanks much, Bob
> > >
> > > On Wednesday, May 7, 2014 3:38:45 AM UTC-7, Tim Holy wrote:
> > > > Given your current design, reinterpret (which is indeed the same thing
> >
> > as
> >
> > > > casting) is not going to work for you, as least not without a lot of
> > > > hacking
> > > > on your part. Can you use an immutable? The key point is that an
> >
> > immutable
> >
> > > > has
> > > > a predictable layout in memory. For that to work, your arrays would
> >
> > have
> >
> > > > to be
> > > > of fixed size.
> > > >
> > > > In the memory layout of immutables, one thing you have to be aware of
> >
> > is
> >
> > > > that
> > > > sometimes "gaps" are introduced for the purpose of memory-alignment.
> > > > (These
> > > > can also be platform-dependent, I believe, but if you're not
> >
> > transferring
> >
> > > > data
> > > > from one machine to another this shouldn't be a problem.) Here's a
> >
> > demo,
> >
> > > > in
> > > > case it's of any value to you:
> > > >
> > > > julia> immutable MyArray
> > > >
> > > > x::Uint16
> > > > y::Uint16
> > > > z::Uint16
> > > >
> > > > end
> > > >
> > > > julia> immutable MyType
> > > >
> > > > a::MyArray
> > > > c::Char
> > > > d::Float64
> > > >
> > > > end
> > > >
> > > > julia> sizeof(MyType)
> > > > 24
> > > >
> > > > julia> r = [uint8(1:24)]
> > > >
> > > > 24-element Array{Uint8,1}:
> > > > 0x01
> > > > 0x02
> > > > 0x03
> > > > 0x04
> > > > 0x05
> > > > 0x06
> > > > 0x07
> > > > 0x08
> > > > 0x09
> > > > 0x0a
> > > >
> > > > ⋮
> > > >
> > > > 0x0f
> > > > 0x10
> > > > 0x11
> > > > 0x12
> > > > 0x13
> > > > 0x14
> > > > 0x15
> > > > 0x16
> > > > 0x17
> > > > 0x18
> > > >
> > > > julia> reinterpret(MyType, r)
> >
> > > > 1-element Array{MyType,1}:
> > MyType(MyArray(0x0201,0x0403,0x0605),'\Uc0b0a09',1.2650169649295773e-192)
> >
> > > > On Tuesday, May 06, 2014 09:57:27 PM Bob Quazar wrote:
> > > > > Tim, Thanks for the response.
> > > > >
> > > > > How can I determine whether my structure (a composite type) is a
> > > >
> > > > contiguous
> > > >
> > > > > block of memory? The composite type has a few fields. One of the
> >
> > fields
> >
> > > > has
> > > >
> > > > > type Array{Array{Uint16,1},1} and another has type
> > > > > Array{Array{Float64,2},1}. Am I out of luck?
> > > > >
> > > > > What does it mean to reinterpret a composite type as a
> > > > > SharedArray{Uint8,1}? Is that something like casting?
> > > > >
> > > > > On Tuesday, May 6, 2014 7:32:47 PM UTC-7, Tim Holy wrote:
> > > > > > If your structure is a contiguous block of memory, you could
> > > >
> > > > presumably
> > > >
> > > > > > use a
> > > > > > SharedArray{Uint8,1} and then reinterpret it as your custom
> >
> > object.
> >
> > > > But if
> > > >
> > > > > > you
> > > > > > want independently garbage-collectable constituents, this is not
> > > >
> > > > likely to
> > > >
> > > > > > be
> > > > > > feasible.
> > > > > >
> > > > > > --Tim
> > > > > >
> > > > > > On Tuesday, May 06, 2014 03:03:01 PM Bob Quazar wrote:
> > > > > > > In Julia, is it possible to share read-only data structures
> >
> > between
> >
> > > > > > > threads/processes running on different processors, without
> >
> > making a
> >
> > > > copy
> > > >
> > > > > > of
> > > > > >
> > > > > > > the entire data structure for each thread? The data structure I
> >
> > need
> >
> > > > to
> > > >
> > > > > > > share isn't of "bit type", so I can't use a SharedArray. And
> >
> > each
> >
> > > > worker
> > > >
> > > > > > > needs to read random parts from the full data structure, so I
> >
> > can't
> >
> > > > use
> > > >
> > > > > > a
> > > > > >
> > > > > > > distributed array. I also tried just putting "@parallel" before
> >
> > a
> >
> > > > for
> > > >
> > > > > > loop,
> > > > > >
> > > > > > > whose body accesses the data structure I want to share, but I
> > > >
> > > > believe
> > > >
> > > > > > the
> > > > > >
> > > > > > > whole data structure gets copied at each iteration. (I could be
> > > >
> > > > wrong
> > > >
> > > > > > about
> > > > > >
> > > > > > > that, but it's certainly very slow.) Does Julia have some
> > > >
> > > > alternative to
> > > >
> > > > > > a
> > > > > >
> > > > > > > copy-on-write for sharing data between threads?
> > > > > > >
> > > > > > > Thanks in advance!
> > > > > > >
> > > > > > > --Bob