This might not be hard at all, but I don't have time to look into it now. So 
to help you investigate, here are some principles:
- each backtrace is a vector of Uints; in profiling data, backtraces are 
separated by NULLs in one gigantic list.
- the Uints correspond to memory locations that are presumably specific to each 
worker
- Profile.retrieve() fetches the list _and_ a lookup dictionary that converts 
memory locations into something that's actually meaningful.

So turn on profiling in each worker, and then ask each worker to call 
Profile.retrieve(), and finally assemble the combined information.

profile.jl is not a huge file; while the printing logic is a little 
complicated, 
the instructions to turn profiling on and off and fetch data are pretty 
trivial. 
If you're writing parallel code, you should have no trouble understanding how 
it works. Particularly, a "flat" report that combines data across workers 
should be pretty easy.

Good luck!
--Tim


On Thursday, October 02, 2014 01:58:31 PM Travis Porco wrote:
> Hello--
> Trying to find out better ways to profile parallel code in Julia, I came
> across a suggestion to rebuild Julia and use "Vtune amplifier" from Intel
> (never heard of Vtune), or to somehow have each worker call a function to
> turn profiling on and off somehow.
> 
> When I run my code, there is a lot that seems to be going on: lines
> involving dict.jl and multi.jl and task.jl, and some or much of it does not
> seem to relate directly to a function call of mine. One could try to guess
> the answer "decryption-style" by removing various lines of code and trying
> to guess what changes. But this is essentially impossible for my
> application, since it is a stochastic simulation, and deleting lines
> changes the subsequent behavior even with the same random seed. I'm hoping
> for something better than voluminous println()'s !
> 
> The idea of mapping everything Julia is doing to some specific call may or
> may not even describe the way it works precisely. However, this sort of
> information
> ...
>      5 ./multi.jl; RemoteValue; line: 590
>       2 ./array.jl; fill!; line: 158
>     11 multi.jl; schedule_call; line: 636
>      2 dict.jl; setindex!; line: 546
>       2 ./tuple.jl; isequal; line: 69
> ...
> while interesting, is not actionable. Nowhere in this particular subtree do
> my function calls get named (not shown). I know something, somewhere, is
> causing a tremendous bottleneck, despite having converted arrays to
> SharedArrays, removed references to global data that were hiding in default
> arguments, and so on...but I still can't tell where it is! (I don't have a
> small minimal postable version of the code.)
> 
> Is there a way to have multiple profile output objects, and have the
> profile data rerouted into different ones as I go? It might at least
> provide some insight (I realize operation 1 might leave the system in a
> state where operation 2 might have extra work to do through no fault of its
> own, so it might not be simple.)
> The dream would be:
> Profile.init_bucket(1)
> Profile.init_bucket(2)
> @profile bucket=1 filter(data1,nsteps=24)
> @profile bucket=2 filter(data2,nsteps=24)
> etc.
> So then I could
> Profile.print(bucket=1)
> to see what went on in the first one, etc.
> I know the syntax doesn't work; I'm not asking for it to, but does anyone
> know a way to do this sort of thing? Save profile data and swap it in and
> out, for example? deepcopy(Profile) obviously fails! Ultimately I've got to
> connect profile output with lines of code, function calls, or data objects !
> 
> Thanks; I hate to post things like this but if there's an answer, somebody
> here will know it, and it might benefit somebody besides me.

Reply via email to