< The runtime also is designed to minimize L1 cache misses, more on that if
there is interest. >
Seconded!
________________________________
From: Daniel Gackle <[email protected]>
To: Fundamentals of New Computing <[email protected]>
Sent: Friday, July 13, 2012 4:21 PM
Subject: Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput,
low latency concurrency?
< The runtime also is designed to minimize L1 cache misses, more on that if
there is interest. >
Interested!
On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang <[email protected]> wrote:
Hi Josh, good to hear from you again. Hope Google is treating you well :)
>
>"The Disruptor" looks to me like it could be an interesting approach
>to optimizing inter-thread communication. Their technical paper is
>very readable. For their problem domain, it makes a lot of sense.
>
>Regarding the Nile C runtime, inter-thread communication is currently
>not a bottleneck. So even if you created a C-based disruptor for Nile,
>or generated Java code from Nile (in order to use theirs, which is
>Java-based), I'm pretty sure you wouldn't see a performance
>improvement.
>
>The Nile runtime currently does several things to keep inter-thread
>communication/contention down. It uses queues of fixed-sized buffers
>for batched stream data. Once a Nile process gets a hold of a buffer
>for reading/writing, it has exclusive access to it, so no
>synchronization is needed during reading/writing. Once the process is
>done with the buffer, then it might contend with another when queueing
>the buffer, but this is unlikely. Queuing is a relatively infrequent
>occurrence (compared to computation and stream data read/writing),
>plus the queues are kept on a per-process basis (there are many more
>Nile processes than OS-level threads running) which scales well.
>
>On top of that, the buffers (and other managed memory objects) are
>laid out carefully to avoid false sharing (of cache lines).
>
>The runtime also is designed to minimize L1 cache misses, more on that
>if there is interest.
>
>I apologize that as usual with my stuff, none of the Nile C runtime is
>documented (and there aren't even program comments). But, I'm
>currently writing all this up for my dissertation, which will be
>available in case these details are of interest to anyone.
>
>Dan
>
>
>On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus <[email protected]> wrote:
>> I hope this may be of general interest, but I'm personally interested on
>> Dan's thoughts on whether Disruptors might be a suitable "compilation
>> target" for Nile. My intuition is that it may be the right way for Nile to
>> run efficiently on commonly available multicore processors (i.e. by
>> minimizing branch-misprediction and memory contention, and optimizing cache
>> behavior).
>>
>> I'm referring to:
>>
>> http://code.google.com/p/disruptor/
>>
>>
>> Google will turn up plenty more references, but the StackOverflow topic is
>> worthwhile:
>> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
>>
>> One limitation that I saw was it appears to work only for simple linear
>> dataflows:, each pipeline stage consumes and produces a single value. This
>> is a consequence of each entry in the ring-buffer having a fixed size (each
>> entry is a data structure containing the "scratch space" necessary to record
>> the input/output of each dataflow stage.
>>
>> This seems to be a serious limitation, since Nile allows you to easily
>> express dataflows with more complicated topologies. For example, to draw a
>> filled shape bounded by a sequence of bezier curves, you might write a Nile
>> program that recursively splits curves until they are "small enough" or
>> "straight enough" to be approximated by linear segments, and then to
>> rasterize these to produce pixels to be shaded by downstream Nile elements.
>> The problem is that you don't know how many outputs will be produced by each
>> pipeline stage.
>>
>> A solution that occurred to me this morning is to use multiple ring buffers.
>> Linear subgraphs of the dataflow (i.e. chained sequences of
>> one-input-to-one-output elements) can fit into a single ring buffer, but
>> elements that produce a varying number of outputs would output to a
>> different ring buffer (or multiple ring buffers if it can produce multiple
>> types of output). This would be extremely cumbersome to program manually,
>> but not if you compile down to it from Nile.
>>
>> I don't understand the Nile C runtime very well, so it's possible that it's
>> already doing something analogous to this (or even smarter).
>>
>> Thoughts?
>>
>> Cheers,
>> Josh
>_______________________________________________
>fonc mailing list
>[email protected]
>http://vpri.org/mailman/listinfo/fonc
>
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc