Responses inline, let me know if something is unclear: On Sun, Oct 19, 2014 at 5:25 PM, Greg Plowman <[email protected]> wrote: > Hi, > > I have several general questions that came up in my first foray into Julia. > > > Julia seems such a delight to work with, things seems to work magically and > lots of details are not required or implicitly assumed. > Whilst this is great for programming, it does mean I'm a little unsure about > some things, especially about types, efficiency and what I get for free. > In any case, here are some questions: > > > I want to do some parallel simulations and combine/reduce the set of results > into a single result. > > As a minimal starting point, I defined a composite type to hold sim results, > a single no-argument constructor, and a + function for reducing. > > > > type Counters > > freqBase::Array{Int64,1} > > freqFeature::Array{Int64,1} > > freqWin::Array{Int64,1} > > freqPrize::Array{Int64,1} > > freqCombination::Array{Int64,2} > > > > # no-argument constructor > > function Counters() > > this = new() > > this.freqBase = zeros(Int64, 100000) > > this.freqFeature = zeros(Int64, 100000) > > this.freqWin = zeros(Int64, 100000) > > this.freqPrize = zeros(Int64, maxPrize) > > this.freqCombination = zeros(Int64, numSymbols, maxState) > > return this > > end > > end > > > > function +(c1::Counters, c2::Counters) > > c = Counters() > > c.freqBase = c1.freqBase + c2.freqBase > > c.freqFeature = c1.freqFeature + c2.freqFeature > > c.freqWin = c1.freqWin + c2.freqWin > > c.freqPrize = c1.freqPrize + c2.freqPrize > > c.freqCombination = c1.freqCombination + c2.freqCombination > > return c > > end > > > > To my surprise this was sufficient to provide the functionality I needed to > return sim result from pmap and reduce to a single result: > > > > const numProcessors = 4 > > > > if nprocs() < numProcessors > > addprocs(numProcessors - nprocs()) > > end > > > > const numTrials = 100 > > const numPlays = 1000000 > > > > trialCounts = pmap(Simulation, fill(numPlays, numTrials)) > > totalCounts = sum(trialCounts) > > > > PrintCountersSummary(trialCounts) # print summary for each trial > > PrintCounters(totalCounts) # print total combined results > > > > Surprisingly (to me) all this worked. > > > > Q1. Why does pmap return Vector{Any} rather than Vector{Counters}, when the > return type from Simulation() is my user-defined type Counters? >
pmap has no good way to find out the return type of the passed in function, so it can't just allocate an array of the correct return type. There is other ways to still get a tight type, but pmap doesn't do that. In this particular instance, the return value is generated by the comprehension here: https://github.com/JuliaLang/julia/blob/master/base/multi.jl#L1475 Currently the array type of a comprehension, when not explicitly specified is whatever the compiler can infer at compile time (in this case it can't infer anything so the return type is Any). It has been proposed (https://github.com/JuliaLang/julia/issues/7258) to change comprehensions to always have the tightest possible type at runtime and if this change happens, pmap will indeed return Vector{Counters} (I think it is very likely that this will happen) > > I inserted an extra line: > > trialCounts = convert(Vector{Counters}, trialCounts) > > > > I was surprised this even worked, because I didn’t define convert. > > Q2. Is there any advantage to using convert? Is it more efficient? E.g. > PrintCounters could be defined to accept argument with Vector{Counters} > If the type the vector is parameterized on, the value can be stored inline, which saves a lot of overhead on small values. In your case Counters is pretty large, so that's not a concern. The other benefit of tighter type parameters is that the compiler will be able to generate more efficient code for the called function. > > Q3. trialCounts is Vector{Any} but sum() works? Presumably sum uses run time > type of actual elements of vector? > Semantically everything in julia depends on the runtime type of the value. In this case, the runtime type of the container is Vector{Any}, but each of the elements inside still have their original type so applying + to them uses the correct method. More type information may let the compiler do better optimizations. > > Q4. Presumably sum() uses my definition of operator +. I also noted that += > works. Where are these defined? What else do I get for "free"? > Functions in julia are generic, so any function written in terms of functions you have defined will work (i.e. if you define + and *) and function that uses those will work for free assuming it's parameters aren't constrained. Sum is defined here: https://github.com/JuliaLang/julia/blob/master/base/reduce.jl#L229 (though I warn you the implementation is slightly complicated. a += b is syntax for (a = a + b). > > It occurred to me that my implementation of + could be improved by defining > a copy constructor. > > > > function Counters(c::Counters) > > this = new() > > this.freqBase = copy(c.freqBase) > > this.freqFeature = copy(c.freqFeature) > > this.freqWin = copy(c.freqWin) > > this.freqPrize = copy(c.freqPrize) > > this.freqCombination = copy(c.freqCombination) > > return this > > end > > > > Then define + as: > > > > function +(c1, c2) > > c = Counters(c1) > > c1.freqBase += c2.freqBase > > ... > > return c > > end > > > > This would seem to eliminate first initialising with zeros. > > However, there was no improvement in practice. Maybe allocation is > insignificant compared to addition. > > > > Then it occurred to me that summing by my definition creates a new object > for addition. > Perhaps a more efficient sum would be to define += as an updating function, > so that a new object does not need to be created. > I tried to define += but received an error. > Instead I defined plusEquals() and this was almost 2x faster than sum(x) or > s += x[i] loop or s = s + x[i] loop. (~50% gc time) > > > > Q5 Why can’t I extend +=? > See above, it's syntax. It's been proposed to extend + and other functions by an out keyword argument, which would allow this optimization easily. > > Q6 Wouldn’t this be faster for summing, and so sum() could be defined in > terms of += rather than + (which creates new object for each element) > If the above proposal is implemented, that would be done. > > I noticed that pmap uses nworkers which is (nprocs - 1) unless nproc==1. > > > > Q7 For case nprocs==2, wouldn’t it make sense to also use the local process > as a worker, since the programmer’s intention was to use parallel processing > (Otherwise for nprocs==2, there is no difference to using map)? > > if p != myid() || np == 1 > > if p != myid() || np <= 2 > This used to be different, but was changed to allow the case of having one local worker and one worker on a cluster. > > > Q8 Is there a way to programmatically determine the number of physical > processors on current machine? Such a function would be useful to use with > addprocs(). > length(Sys.cpu_info()) > > Thanks > Greg
