Hi,
On 08/28/2016 06:06 PM, Shmuel Levine wrote:
> Hi All,
>
> I've finally found a bit of time once again to work on my hobby project
> with HPX... The long break actually gave me a fresh perspective on my
> own code, and it occurred to me that my code has some serious issues
> with memory management, and I'm hoping that someone can help to provide
> me with some better insight into how to best handle memory management
> while working in a distributed app. In particular, I would greatly
> appreciate some specific guidance on how to address the issue in my own
> code, since I'm at a bit of a loss here
Let me try to answer your question. I am not sure I understood
everything correctly though...
>
>
> At a high level (and obviously simplified), my code can be summarized as
> follows:
> - I'm trying to implement a genetic-algorithm-based optimization system,
> which is intended to be used to optimize parameters for large numerical
> models. Matrices may likely contain millions of elements.
> - The matrix class itself is implemented using
> hpx::serialization::serialize_buffer<T> as the backing storage type.
Sounds good so far.
> - Right now, I've written an HPX component which I've called
> Model_Driver. My intent was to instantiate one instance of Model_Driver
> on each locality with a [type-erased] Model class passed as an argument
> for dependency injection.
Nothing wrong there, I guess.
> - I intend to manage the parameter matrices on the main locality.
> Model_Driver has an action called Evaluate_Model_With_Parameters(Matrix
> m), which returns a future<float> representing the fitness of a given
> set of parameters. This is returned back to the primary locality which
> performs selection to determine the 'winning' parameters, and 'breeds'
> the next generation.
> - I've been using dataflow to handle the optimization process, including
> selection, generating the next generation, etc.
Sounds good as well.
>
> In general, there are a large number of Matrix objects created and
> destructed - there is, essentially, a necessity to use a custom
> allocator to manage the allocation/deallocation of memory in the program.
Alright.
>
> The first and naive attempt that I made (currently, it's all that I've
> done) is a Matrix_Data_Allocator class, which manages a memory pool.
> [1] The free_list is a static object in the allocator class, and the
> allocate and deallocate functions are static functions. Similarly, the
> mutex is also a static member of the allocator class.
Ok. A possible optimization would be to either use thread local free
lists or lockfree/waitfree ones.
>
> The obvious problem with this is that although it should work fine with
> a single locality, it is clearly going to cause segmentation faults in
> a distributed app. Although, from my understanding of the serialization
> code in HPX, the transfer of a Matrix from the main locality to a
> remote locality to calculate the model fitness does not use the Matrix
> allocator -- allocation is handled by the serialization code, all other
> constructors/destructors will be a problem.
Well, what happens during serialization is that the data is copied over
the network
and in the case of a container with dynamic size, you allocate your
memory and
then copy the received data (inside of the archive) into the newly
created objects.
I don't think that creates any problems for you. The allocator you
described above,
only carries global state (through the static variables). So the
serialization
of the allocator would essentially do nothing (Look at it as a tag on
which allocator
to use). So when receiving a new serialize_buffer and deserializing it,
you just
allocate memory from the locality local free list (the same should
happen when deallocating the memory).
>
> The most obvious way to work around the problem that comes to my mind
> would be changing the free_list (and mutex) into a
> std::map<std::uint32_t, free_list_type> (and
> std::map<std::uint32_t,mutex>) so that each locality has a separate
> mutex, but something about this seems to me to be wrong -- it requires
> the allocator to be tightly-coupled with the HPX runtime, so that the
> allocator can call hpx::get_locality:id() to index the appropriate
> free_list.
I don't think that is needed at all. static variables are not part of
AGAS, they are local to your process.
>
> Similarly, the Model class (injected into the Model_Driver component) --
> in which is where a large proportion of the Matrix allocations occurs --
> also presently is not coupled at all to the HPX runtime. Although,
> conceivably, Model_Driver could provide a locality_id to the Model class
> (to then pass along to a Matrix?). Although my first inclination is that
> a Matrix class should not have knowledge of the [distributed]
> architecture on which it runs, perhaps where dealing with a distributed
> program architecture, it is necessary to create distributed-type classes
> -- i.e. something like class Distributed_Matrix : public Matrix {..};
> explictly
> Having said that, those are merely some speculations which came to mind
> while trying to organize my thoughts and present this question. It is
> still remains in mind, however, unclear. Something tells me that there
> must be a better way to deal with this. Hopefully, people with more
> brains and experience can provide me with some insight and guidance.
I hope the description above sheds some light on it, the matrix class
doesn't need any
locality information, unless you want to create a truly distributed data
structure (as opposed to just a regular container that is sent over the
wire).
>
> I would greatly appreciate any suggestions that you can offer. If you
> require further details of my code, please let me know and I'd be more
> than happy to elaborate further. However, I think that the problem
> itself is fairly generic and is relevant to most code which is written
> for a distributed environment - especially where the parallelism isn't
> handled explicitly in the code (as opposed to an MPI program, for
> example, where this is far more straightforward).
>
> Thanks and best regards,
> Shmuel Levine
>
>
> [1] The actual code is slightly more complicated than the above
> description, although I don't think that it changes the nature of the
> question or the appropriate solution signifcantly. In particular, each
> set of parameters is typically a std::vector<Matrix>, where each Matrix
> is a different size. In other words, the code uses multiple matrix
> sizes, although the number of different sizes is constrained to the
> dimension of the parameter vector above. The actual allocator
> definition is as follows:
>
> class Matrix_Allocator {
> public:
> using T = float;
> using data_type = T;
> static const int64_t alignment = 64;
>
> private:
> using mutex_type = hpx::lcos::local::spinlock;
> using free_list_type = std::map<int64_t, std::stack<T *>>;
> using allocation_list_type = std::map<T *, int64_t>;
>
> public:
> Matrix_Allocator() {}
> ~Matrix_Allocator();
> Matrix_Allocator(Matrix_Allocator const &) = delete;
> Matrix_Allocator(Matrix_Allocator &&) = delete;
>
> static T *allocate(int64_t n);
> static void deallocate(T *p);
>
> private:
> static mutex_type mtx_;
> static free_list_type free_list_;
> static allocation_list_type allocation_list_;
>
> }; // class Matrix_Allocator
>
> The allocation_list_ is used to track the allocated size of a given
> pointer, to determine to which free_list should the pointer be added
> upon destruction of a matrix.
>
> _______________________________________________
> hpx-users mailing list
> [email protected]
> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
>
--
Thomas Heller
Friedrich-Alexander-Universität Erlangen-Nürnberg
Department Informatik - Lehrstuhl Rechnerarchitektur
Martensstr. 3
91058 Erlangen
Tel.: 09131/85-27018
Fax: 09131/85-27912
Email: [email protected]
_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users