[hpx-users] Memory management in a distributed app

Shmuel Levine Sun, 28 Aug 2016 09:08:09 -0700

Hi All,

I've finally found a bit of time once again to work on my hobby project 
with HPX...  The long break actually gave me a fresh perspective on my 
own code, and it occurred to me that my code has some serious issues 
with memory management, and I'm hoping that someone can help to provide 
me with some better insight into how to best handle memory management 
while working in a distributed app.  In particular, I would greatly 
appreciate some specific guidance on how to address the issue in my own 
code, since I'm at a bit of a loss here



At a high level (and obviously simplified), my code can be summarized as 
follows:
- I'm trying to implement a genetic-algorithm-based optimization system, 
which is intended to be used to optimize parameters for large numerical 
models.  Matrices may likely contain millions of elements.
- The matrix class itself is implemented using 
hpx::serialization::serialize_buffer<T> as the backing storage type.
- Right now, I've written an HPX component which I've called 
Model_Driver. My intent was to instantiate one instance of Model_Driver 
on each locality with a [type-erased] Model class passed as an argument 
for dependency injection.
- I intend to manage the parameter matrices on the main locality. 
Model_Driver has an action called Evaluate_Model_With_Parameters(Matrix 
m), which returns a future<float> representing the fitness of a given 
set of parameters.  This is returned back to the primary locality which 
performs selection to determine the 'winning' parameters, and 'breeds' 
the next generation.
- I've been using dataflow to handle the optimization process, including 
selection, generating the next generation, etc.

In general, there are a large number of Matrix objects created and 
destructed - there is, essentially, a necessity to use a custom 
allocator to manage the allocation/deallocation of memory in the program.

The first and naive attempt that I made (currently, it's all that I've 
done) is a Matrix_Data_Allocator class, which manages a memory pool. 
[1]  The free_list is a static object in the allocator class, and the 
allocate and deallocate functions are static functions. Similarly, the 
mutex is also a static member of the allocator class.

The obvious problem with this is that although it should work fine with 
a single locality, it is clearly  going to cause segmentation faults in 
a distributed app.  Although, from my understanding of the serialization 
code in HPX, the  transfer of a Matrix from the main locality to a 
remote locality to calculate the model fitness does not use the Matrix 
allocator -- allocation is handled by the serialization code, all other 
constructors/destructors will be a problem.

The most obvious way to work around the problem that comes to my mind 
would be changing the free_list (and mutex) into a 
std::map<std::uint32_t, free_list_type> (and 
std::map<std::uint32_t,mutex>) so that each locality has a separate 
mutex, but something about this seems to me to be wrong -- it requires 
the allocator to be tightly-coupled with the HPX runtime, so that the 
allocator can call hpx::get_locality:id() to index the appropriate 
free_list.

Similarly, the Model class (injected into the Model_Driver component) -- 
in which is where a large proportion of the Matrix allocations occurs -- 
also presently is not coupled at all to the HPX runtime.  Although, 
conceivably, Model_Driver could provide a locality_id to the Model class 
(to then pass along to a Matrix?). Although my first inclination is that 
a Matrix class should not have knowledge of the [distributed] 
architecture on which it runs, perhaps where dealing with a distributed 
program architecture, it is necessary to create distributed-type classes 
-- i.e. something like class Distributed_Matrix : public Matrix {..};
explictly
Having said that, those are merely some speculations which came to mind 
while trying to organize my thoughts and present this question.  It is 
still remains in mind, however, unclear.  Something tells me that there 
must be a better way to deal with this. Hopefully, people with more 
brains and experience can provide me with some insight and guidance.

I would greatly appreciate any suggestions that you can offer.  If you 
require further details of my code, please let me know and I'd be more 
than happy to elaborate further. However, I think that the problem 
itself is fairly generic and is relevant to most code which is written 
for a distributed environment - especially where the parallelism isn't 
handled explicitly in the code (as opposed to an MPI program, for 
example, where this is far more straightforward).

Thanks and best regards,
Shmuel Levine


[1] The actual code is slightly more complicated than the above 
description, although I don't think that it changes the nature of the 
question or the appropriate solution signifcantly.  In particular, each 
set of parameters is typically a std::vector<Matrix>, where each Matrix 
is a different size. In other words, the code uses multiple matrix 
sizes, although the number of different sizes is constrained to the 
dimension of the parameter vector above.  The actual allocator 
definition is as follows:

class Matrix_Allocator {
public:
   using T = float;
   using data_type = T;
   static const int64_t alignment = 64;

private:
   using mutex_type = hpx::lcos::local::spinlock;
   using free_list_type = std::map<int64_t, std::stack<T *>>;
   using allocation_list_type = std::map<T *, int64_t>;

public:
   Matrix_Allocator() {}
   ~Matrix_Allocator();
   Matrix_Allocator(Matrix_Allocator const &) = delete;
   Matrix_Allocator(Matrix_Allocator &&) = delete;

   static T *allocate(int64_t n);
   static void deallocate(T *p);

private:
   static mutex_type mtx_;
   static free_list_type free_list_;
   static allocation_list_type allocation_list_;

}; // class Matrix_Allocator

The allocation_list_ is used to track the allocated size of a given 
pointer, to determine to which free_list should the pointer be added 
upon destruction of a matrix.

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

[hpx-users] Memory management in a distributed app

Reply via email to