Hi,
I've been playing around a bit with HPX and am impressed. That said, I'm
having a bit of trouble wrapping my head around this model of thinking and
hoping someone can help to shed some light for me on. Firstly, I'd
appreciate feedback on the general architecture / approach that I am looking
to use - there is going to be, no doubt, a number of ways to implement what
I have in mind, and I'd appreciate some guidance from the HPX community as
to the preferred approach. I also have some specific implementation
questions which I will post to the mailing list separately to allow them to
be properly indexed.
As a pet project for experimenting with HPX, I've been looking to work on
genetic-algorithm-based optimization. In particular, I've been using
Differential Evolution as the specific genetic algorithm, although I would,
ultimately, like to ensure that whatever code I end up writing is as
flexible as possible.
Having read (and repeatedly re-read) the futurization example in the
documentation, I think I have a rough idea of the structure of such a
program, but, as I said before, I'm not sure whether that idea will work or
if there is a better way to accomplish this.
Here's what I've got so far:
- Model is an abstract base class providing an API for a model to
parameterize
- Model_Driver is an HPX component with a 1:1 mapping with
localities (i.e. each locality has a driver). The driver is constructed
with the Model as a parameter. The main idea behind having this
Model_Driver object is to allow static data to be locally cached on a
locality so that it can be used for subsequent iterations without having to
recalculate it. The example I had in mind here was a transformation of time
series data into a matrix of inputs (such that each column represents a time
step), although there is likely many other ways to achieve the same thing.
The Model_Driver constructor will take the following parameters: concrete
Model type, a cost function object, a vector of input data).
- Model_Fitter controls the optimization process. It will contain
a vector of parameters - each element of the vector represents one possible
solution from the population of solutions.
In very broad strokes, the idea is like this:
Class Model_Fitter {
using parameter_type = std::vector<double>;
using population_type = std::vector<parameter_type>;
using population_set_type = std::vector<hpx::shared_future<population_type>
>;
std::vector<Model_Driver> drivers;
static parameter_type Fit_Model(Model const&, Cost_Function const&,
std::vector const& input_data, int population_size, /*some struct for
optimizer options */) {
population_set_type population_set_;
population_set_.reserve(2);
std::vector < hpx::shared_future < std::vector < double > > >
parameter_costs_;
parameter_costs_.reserve(2);
for (auto& population : population_set_){
population.reserve(population_size);
random_initialize(population);
}
Initialize_drivers();
// also initialize the parameter costs to max double
while ( !converged() ) {
Prepare_Next_Generation(++iteration_idx);
auto& current_parameters = parameter_set [ iteration_idx % 2];
auto& parent_parameters = parameter_set [(iteration_idx + 1) % 2];
auto& current_costs = parameter_costs_ [ iteration_idx % 2];
auto& parent_costs = parameter_costs_ [(iteration_idx + 1) % 2];
int locality_idx {0}; // for initial load-balancing
for (int pop_idx = 0; pop_idx < population_size; ++pop_idx){
auto Evaluate_Model_Op = htx::util::bind (
&Model_Driver::Evaluate_Model_With_Parameters, drivers[locality_idx],
hpx::util::placeholders::_1); // call static function on a particular driver
object. Assign calculation to different drivers to balance load.
current_parameters[pop_idx] = hpx::dataflow
( hpx::launch::async, [&]( ) -> parameter_type {
// Get the fitness / cost of
a given set of parameters
hpx::shared_future <double >
fitness = Evaluate_Model_Op ( current_parameters [ pop_idx ] ) ;
// If this current set of
parameters is more fit than parent, keep the child; otherwise, keep the
parent
if ( fitness < parent_costs
) {
current_costs = fitness;
return
current_parameters;
}
else {
current_costs = parent_costs;
return
parent_parameters;
}
}
++locality_idx;
locality_idx %= num_localities;
}
}
I haven't tried compiling the above code, so there's undoubtedly various
errors - some small and some large. But I hope that it generally conveys my
thought process for handling this. Essentially:
- Have 2 copies of the population of parameters. This would be
analogous to std::vector <space> in stepper in 1d_stencil example code, and
as I understand, is necessary for the dataflow to work.
- At this time, the parameters are all managed on the main
locality. A new set of possible solutions is generated on the main locality
and passed to the driver object on each individual locality, where that set
of parameters is used to evaluate the fitness of the model and return some
fitness value. The 'fitter' of the parent and child survives to the next
generation. Obviously, a different type of genetic algorithm would have a
different strategy.
- The function Evaluate_Model_Op is, potentially, a fairly
expensive operation. Although the parameters are distributed uniformly at
the beginning of each iteration, I believe that hpx uses work stealing to
allow a faster locality to take on a larger percentage of the evaluations.
- As I see it, parameter_type does not necessarily need to be an
HPX Component (the code above is, obviously not a component), as the
distribution of work is accomplished with the driver objects. That being
said, however, I can see uses for this code where there may be a large
number of parameters - might it be possible to hide network i/o latency by
having the parameters as a component with a Get_Data() action returning a
future to a set of parameters?
Hopefully my intention with the above code snippets and commentary is fairly
clear. As mentioned before, I would truly appreciate any help / input that
anyone can offer. I'd especially appreciate feedback as to whether I am
approaching the problem in a convoluted or completely wrong manner.
Thanks and best regards,
Shmuel
_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users