Hi Hartmut,
Thanks for the quick reply. It appears that I was not completely clear in my original question. Specifically, I seem to have the same problems regardless of whether or not I'm using MKL. The separate matrix multiplication test code that I wrote was for the purposes of determining whether or not MKL was the cause of these issues. Based on cpu usage and on timing of each of the three cases, I'm still finding that: 1) cpu usage is not more than 100% 2) the sequential version of the multiplication function runs faster than the parallel and vectorized versions. As mentioned, changing the hpx:threads argument only adds overhead and makes the code run much slower. Thanks From: Hartmut Kaiser Sent: Monday, May 1, 7:40 AM Subject: Re: [hpx-users] Troubleshooting (lack of) parallel execution To: [email protected] Shmuel, > I'm looking for some help in understanding why my code does not appear to > be executing in parallel with the HPX system. The only reason I could think of for the strange behavior you're seeing would be that you're using the parallel version of MKL. MKL is parallelized using openmp and there is no way (AFAIK) to tell it to just use part of the machine. So it will try to use all of the cores of the node you're running on. That in turn interferes with HPX's way of binding it's worker-threads to the cores itself. We have had good results when using MKL with HPX, but only if you link with the sequential (non-parallel) version of MKL and leave all the parallelization to HPX (by scheduling more than one MKL task at the same time, if necessary. I have no experience with VML, but I'd assume it's the same issue. HTH Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > I've first noticed the issue while working on my main codebase, in which > I've been trying to implement a genetic-algorithm-based optimizer for non- > linear systems. Since that code (at the present time) uses Intel MKL > (BLAS level 3 library functions) and VML (vector math library), in > conjunction with HPX futures, dataflow, etc., I wasn't sure if there was > some problem caused by OpenMP or something similar, which might have > prevented the code from running in parallel. > > I then wrote a simpler test program using only HPX parallel algorithms to > implement basic matrix-matrix multiplication. I found the exact same > result in both cases - my program does not appear to be running any of the > concurrent code -- neither in the case of my original program using > futures, continuations, and dataflow lcos, nor in the simplified matrix > code. > > I've tried using different options for --hpx:threads, but when this number > is greater than 1, I've found that the overhead of thread creation and > scheduling is exceedingly high and slows down the entire program > execution. I'm not sure if that is typical behaviour -- I have tried to > ensure that the amount of computation within a given asynchronous function > call is fairly substantial so that the real work is far in excess of any > overhead (although I may have under-estimated). Typically, in the case of > my code, the concurrency is at the genetic-algorithm 'population' level - > for example, the following code snippet is where I generate random numbers > for the crossover step of differential evolution. fitter_state_ is a > boost::shared_ptr. (The random number generator engines are set-up > elsewhere in the code and there are 1 for each trial vector, to ensure > that the code is thread-safe). I realize that the code below does not > need to use dataflow, although I'm skeptical that this would be the cause > for the code not running in parallel. > > size_t trial_idx = 0; > CR_population_type &CR_vector_current = > fitter_state_->crossover_vector_set_[fitter_state_- > >Current_Index()]; > > for (future_type &crossover_vector : CR_vector_current) > { > crossover_vector = hpx::dataflow(hpx::launch::async, [=]() { > auto &rng = fitter_state_->cr_RNGs[trial_idx]; > modeling::model_fitter_aliases::CR_vector_type cr_vector_; // > cr_vector is of type std::vector > cr_vector_.reserve(total_number_of_parameters_); > > std::uniform_int_distribution CR_dist( > 0, fitter_state_->crossover_range); > > for (int param_idx = 0; param_idx < total_number_of_parameters_; > ++param_idx) { > cr_vector_.push_back(CR_dist(rng)); > } > return cr_vector_; > }); > > trial_idx++; > } > > > From what I can tell, the above code never runs in parallel (among other > things, the CPU usage drops from 500% while running MKL functions down to > 100%). Likewise, the simplistic matrix multiplication code using parallel > algorithms also only uses 100% CPU. > > core::Matrix times_parunseq(core::Matrix &lhs, core::Matrix &rhs) { > > if (lhs.Cols() != rhs.Rows()) > throw std::runtime_error("Imcompatible Matrix dimensions"); > > core::Matrix m{lhs.Rows(), rhs.Cols()}; > Col_Iterator out_iter(&m); > > // Outermost-loop -- columns of lhs and output > hpx::parallel::for_loop_n_strided( > hpx::parallel::seq, 0, rhs.Cols(), rhs.Rows(), [&](auto out_col_idx) > { > > hpx::parallel::for_loop_n( > hpx::parallel::seq, 0, lhs.Rows(), [&](auto out_row_idx) { > > m(out_row_idx, out_col_idx) = > hpx::parallel::transform_reduce( > hpx::parallel::par_vec, Row_Iterator(&lhs, {out_row_idx, > 0}), > Row_Iterator(&lhs, {out_row_idx, lhs.Cols()}), > Col_Iterator(&rhs, {0, out_col_idx}), 0.0f, > std::plus(), > [&](const float &a, const float &b) { return a * b; }); > }); > > }); > return m; > } > > I've tried using seq, par, par_unseq for the 2 outer loops, but that did > not make any difference in the performance. I understand that using > parallel::execution::par and parallel::execution::par_unseq just means > that the code *can* be run in parallel and/or vectorized. However, I > cannot understand why the code does not actually run in parallel or using > vectorization. > > The complete code I've been using is at the link below: > https://github.com/ShmuelLevine/hpx_matrix/blob/master/matrix/matrix.cc > > Some insights would be greatly appreciated... this is a matter of > considerable frustration to me... > > Thanks and best regards, > Shmuel _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
_______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
