Re: STM and persistent data structures performance on mutli-core archs

Gary Trakhman Tue, 18 Mar 2014 09:42:09 -0700

Martin,

You recommend message-passing approaches like Erlang's as generally
superior, but I'm curious if there's any more specific thoughts to the
relative tradeoffs from shared-memory by default vs message-passing, ie,
where you might rely on hardware-level copies (cache coherence) for
coordination in the first, but you rely on software-level data copies in
the second for coordination.


Is it just a more conservative approach to allocate resources by having to
go through more manual work to coordinate processes, or is there more to
it?  I could see that having development cost (time and effort)
proportional to the amount of coordination might result in systems that are
likely to avoid these bottlenecks where possible.  In the case of
shared-memory, there is an inverse relationship, ie it takes more work to
decouple systems and remove extra synchronization than it does to build
them badly.


On Tue, Mar 18, 2014 at 7:03 AM, Martin Thompson <mjpt...@gmail.com> wrote:

> In my personal experience I cannot get within 10X the throughput, or
>>> latency, of mutable data models when using persistent data models.
>>>
>>
>> Hi Martin,
>> Thanks for finding this thread :-). Let me ask a reversed question. Given
>> you come from a persistent data model where code remains reasonably simple.
>> How much effort it really takes to make an imperative model working well
>> with relatively low number of defects? How deep you go to make sure that
>> data structures fit CPU architecture in terms of topology as well as of
>> size of caches? And how it scales in terms of writing the code itself (I
>> mean are code alternations are easy straightforward or you have to write it
>> from scratch)?
>>
>
> I've never heard of "imperative model". I'm aware of imperative
> programming. Can you expand on what you mean?
>
> I think in the context of this thread 2 points are being conflated here.
> We started discussing the point about how "immutable data (actually
> persistent data structures) solve the parallel problem".  Parallel in the
> context of increasing thread count as a result of core counts increasing in
> CPUs, but actually concurrent in access and mutation of data structures in
> the micro and domain models in the macro. This is being conflated with the
> performance of data structures in general.
>
> I believe that concurrent access to data structures should not be the
> default design approach. The greatest complexity in any system often comes
> from concurrent access to state, be the persistent or not. It is better to
> have private data structures within processing contexts (threads,
> processes, services, etc.), which communicate via messages. Within these
> private processing contexts concurrent access is not an issue and thus
> non-concurrent data structures can be employed. With non-concurrent access
> it is easy to employ rich data structures like basic arrays,
> open-addressing hash maps, B+ and B* trees, bitsets, bloom filters, etc.,
> without which many applications would be unusable due to performance and
> memory constraints.
>
> When working in the context of single threaded access to data structures I
> see using a blend of functional, set theory, OO, and imperative programming
> techniques as the best way to go. Right tool for the job. I see the effort
> levels as very similar when choosing the appropriate technique that fits a
> problem. I have many times seen a code mess and pain result from
> inappropriate techniques applied blindly like religion to problems that
> just are not a good fit.
>
> So to more directly answer you question. If I need to have concurrent
> access to a shared data structures I prefer it to be persistent from a
> usability perspective given performance constraints of my application at
> satisfied. If I need greater performance I find non-persistent data
> structures can give better performance for a marginal increase in
> complexity. I've never quantified it but it feels like percentage points
> rather than factors. I frame this in the context that the large step in
> complexity here is choosing to have concurrent access to the data. It is
> the elephant in the room. Those who are not good at concurrent programming
> should just not be in this space in the first place because if they do it
> will get ugly either way. I'd recommend reading the findings of the "double
> hump" paper from Brunel University on peoples natural ability in
> programming.
>
>
> http://blog.codinghorror.com/separating-programming-sheep-from-non-programming-goats/
>
> Hope this helps clarify.
>
> Martin...
>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: STM and persistent data structures performance on mutli-core archs

Reply via email to