I have tried the construction below with no success.  In v0.4.3, I end up 
getting a segmentation fault.  In the latest v.0.5.0, the run time is 3-4x 
as long as the non-parallelized version and the array constructed is vastly 
different than the one that is constructed using the non-parallelized code. 
 Below is the C++ code that I am essentially trying to emulate:

void TreeLattice<Impl>::stepback(Size i, const Array& values,

                                     Array& newValues) const {

        #pragma omp parallel for

        for (Size j=0; j<this->impl().size(i); j++) {

            Real value = 0.0;

            for (Size l=0; l<n_; l++) {

                value += this->impl().probability(i,j,l) *

                         values[this->impl().descendant(i,j,l)];

            }

            value *= this->impl().discount(i,j);

            newValues[j] = value;

        }

    }

The calls to probability, descendant, and discount all end up accessing 
data in other objects, so I tried to prepend those function and type 
definitions with @everywhere.  However, that started me on a long chain of 
having to eventually wrap each file in my module in @everywhere, and there 
were still errors complaining about things not being defined.  At this 
point I am really confused as to how to construct what would appear to be a 
rather simple parallelized for loop that generates the same results as 
non-parallelized code.  I've poured over both this forum and other 
resources, and nothing has really worked.

Any help would be appreciated.

Thanks!

Chris


On Thursday, August 20, 2015 at 4:52:52 AM UTC-4, Nils Gudat wrote:
>
> Sebastian, I'm not sure I understand you correctly, but point (1) in your 
> list can usually be taken care of by wrapping all the necessary 
> usings/requires/includes and definitions in a @everywhere begin ... end 
> block.
>
> Julio, as for your original problem, I think Tim's advice about 
> SharedArrays was perfectly reasonable. Without having looked at your 
> problem in detail, I think you should be able to do something like this 
> (and I also think this gets close enough to what Sebastian was talking 
> about, and to Matlab's parfor, unless I'm completely misunderstanding your 
> problem):
>
> nprocs()==CPU_CORES || addprocs(CPU_CORES-1)
> results = SharedArray(Float64, (m,n))
>
> @sync @parallel for i = 1:n
>     results[:, i] = complicatedfunction(inputs[i])
> end
>

Reply via email to