On Mon, Dec 23, 2013 at 12:22 PM, Rajiv Kurian <[email protected]> wrote:
> This seems like a common pattern in multi-threaded applications:
>
> 1) Single thread reads of the network, allocating buffers as necessary.
> 2) It sends these buffers along with other info to worker threads that
> actually do real work. A worker thread processes the event entry and then
> frees the buffers from step (1). Typically a ring buffer or some other kind
> of queue is used with an entry that resembles something like this:
>
> struct event_entry {
> char* buffer;
> int length;
> // Other stuff.
> }
>
> The problem with this pattern is that memory allocated on one thread is
> freed on another. Besides the constant allocation and frees, this also
> suffers from the allocation and free actually happening on different threads
> causing contention.
>
> I need to use a pointer to a buffer instead of an inlined array since the
> size of data is for each entry is variable and unknown. Currently this is
> what I am planning to do to handle this:
>
> 1) Producer maintains a pool of buffers of different sizes. I could also
> just use a memory allocator like jemalloc which does pooling behind the
> scenes.
> 2) Producer picks an appropriately sized buffer from the pool for an
> incoming request. It's easy to pick a size especially if the protocol is
> length prefixed.
> 3) Once there are enough bytes to form a complete entry (based on our
> protocol) the producer puts a pointer to this buffer on a ring buffer entry
> and publishes it.
> 4) The consumer picks an entry off the ring buffer and synchronously
> processes it. If it needs the buffer entry beyond the point of initial
> processing it copies it (this is rare for me). It marks the ring buffer
> entry as processed.
> 5) This ties to step (2). Whenever the producer doesn't have the right sized
> buffer in it's pool it checks all the ring buffer entries already marked
> processed by the consumer in this cycle of the ring buffer. It does so by
> checking the sequence number of the consumer/worker. It claims all of these
> buffers as processed and puts them back in its pool. This logic needs to be
> run at least once per cycle of the ring buffer (it could be triggered early
> because there was a shortage of buffers) otherwise we will end up reusing
> buffers that are still being processed. If after a reclamation it still
> cannot find a right sized buffer it just allocates one and adds it to the
> pool (should be rare in steady state).
>
> Any comments or obvious faults with my logic? What do you guys use to
> exchange buffers of dynamic sizes between two threads? I am trying to avoid
> cross thread allocate and free. In fact I am trying to avoid allocate and
> free in steady state all together.


Hi!

Your scheme sounds reasonable.


As far as I understand it's related to your other question about SPMC
queue. So if you use a queue like this:
http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
and have an upper bound on buffer size and ready to waste some memory,
then you can embed the buffer directly into each element of the queue.
This queue is actually bi-directional, currently consumers transfer
"empty" elements back to producer. But there is no reason they can not
transfer useful payload back to producer. So in the extreme case you
can have just:

#define MAX_MSG_SIZE (64<<10)
#define QUEUE_SIZE (1<<10)

struct element {
  char buf[MAX_MSG_SIZE];
  ... other data
};

queue<QUEUE_SIZE, element> q;

This consumes 64M of memory persistently.  But it is as simple and as
fast as you can get.

You will need to "split" enqueue and dequeue functions into 2 parts:
first waits when the next element becomes ready for
production/consumption, and the second part "commits"
production/consumption.
Producer workflow:
- wait till the next element becomes ready for production (it's most
likely already ready, because the queue if FIFO)
- fill in the element (the buffer is already there, right in the element)
- commit production (make the element ready for consumption)
- repeat
Consumer workflow is symmetric:
- wait till the next element becomes ready for consumption
- process the element (don't need to copy the buffer, read it right in place)
- commit consumption (make the element ready for subsequent production)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3uiAXpoNKtudEynj4WFr4%2BxNzhi9uTDavQ1RqMGqHRsTA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to