I have tried the construction below with no success. In v0.4.3, I end up
getting a segmentation fault. In the latest v.0.5.0, the run time is 3-4x
as long as the non-parallelized version and the array constructed is vastly
different than the one that is constructed using the non-parallelized code.
Below is the C++ code that I am essentially trying to emulate:
void TreeLattice<Impl>::stepback(Size i, const Array& values,
Array& newValues) const {
#pragma omp parallel for
for (Size j=0; j<this->impl().size(i); j++) {
Real value = 0.0;
for (Size l=0; l<n_; l++) {
value += this->impl().probability(i,j,l) *
values[this->impl().descendant(i,j,l)];
}
value *= this->impl().discount(i,j);
newValues[j] = value;
}
}
The calls to probability, descendant, and discount all end up accessing
data in other objects, so I tried to prepend those function and type
definitions with @everywhere. However, that started me on a long chain of
having to eventually wrap each file in my module in @everywhere, and there
were still errors complaining about things not being defined. At this
point I am really confused as to how to construct what would appear to be a
rather simple parallelized for loop that generates the same results as
non-parallelized code. I've poured over both this forum and other
resources, and nothing has really worked.
Any help would be appreciated.
Thanks!
Chris
On Thursday, August 20, 2015 at 4:52:52 AM UTC-4, Nils Gudat wrote:
>
> Sebastian, I'm not sure I understand you correctly, but point (1) in your
> list can usually be taken care of by wrapping all the necessary
> usings/requires/includes and definitions in a @everywhere begin ... end
> block.
>
> Julio, as for your original problem, I think Tim's advice about
> SharedArrays was perfectly reasonable. Without having looked at your
> problem in detail, I think you should be able to do something like this
> (and I also think this gets close enough to what Sebastian was talking
> about, and to Matlab's parfor, unless I'm completely misunderstanding your
> problem):
>
> nprocs()==CPU_CORES || addprocs(CPU_CORES-1)
> results = SharedArray(Float64, (m,n))
>
> @sync @parallel for i = 1:n
> results[:, i] = complicatedfunction(inputs[i])
> end
>