Back to the basics (of research) - Concurrent data-structures in 2.0 ?

Bosinski Sat, 25 Jun 2022 12:30:17 -0700

Hello evbd,

afaik Martin Odersky once described a app as _a flow of typesafe 
transformations_. These days one might like to add _efficient multiprogram type 
transformations in a shared-memory environment_.


Recently i've spent some time to look at a couple of research-papers (and some 
lecture videos) concerned with concurrent a.k.a. _" lock-free"_ 
data-structures. The earliest papers were published around the year 2000 
describing a concept named 'Read-Copy-Update' (RCU) which had been implemented 
into the Linux-kernel around ~2004 and was released as user-space-lib under the 
name 'Userspace-RCU' four years after that. RCU is a brainchild of Paul E. 
McKenney, who is currently concerned with the memory-model of the Linux-kernel. 
Later Hazard-Pointers (HP) were invented and coined by Maged Michael. In 2017 
both concepts have been accepted by the C++-Comittee into a 
Technical-Specification (TS) to be worked out in the coming years. Early 
adoption can be seen & tested in Facebooks "Folly"-library. Another interesting 
'flavour' of RCU has been published by DPDK and was presented by Nathan Brown 
in february at FOSDEM-2022 [ 
<https://fosdem.org/2022/schedule/event/comparing_dpdk_rcu_and_user_space_rcu_library/>
 ]. The above mentioned concepts deal with the problem of 
safe-memory-reclamation. Authors group these concepts in two families - 
epoch-based=RCU, Pointer-based=Hazard-Pointers - and hybrids of both (e.g. 
Hazard-Eras). I leave out (Software)-Transactional-Memory (STM) here - as long 
as it remains a hackers-wet-dream - instead trying to focus on the 
CPP-Comittees accepted concepts, that will prbl. get finalized into the 
C++-Standard in 2026. Looking at Nims `std/lib` and nimble-packages i wonder 
why there are so few attempts to introduce concurrent data-structures ? Many 
have expressed the wish for better concurrency support, since `threads:on` will 
become the default in 2.0. The lock-based List/Table-implementations from 
`std/shared` are nice-to-have, but prbl. not too important for future projects. 
Don't get me wrong here - locks will remain important in the future, but please 
not in concurrent data-structures that see reads&updates from many threads. 
Having collected a series of papers describing breakthru inventions & 
implementations of (b)lock-free data-structures e.g. Harris Linked-Lists 2002, 
Shavit/Shalev Lock-free extensible Hash-Tables (~2006) based on 
'split-ordered-lists'. Those have been tested and the (amazing) 
gains-in-performance have been reported and confirmed in many later publication 
by different authors. I regard these as building-blocks for any concurrent-DS 
to come. So following along a quote by **G. van-Rossums**

_" When you approach a new language, first look and measure the quality of the 
provided data-structures"_.

I've done exacly this and learned that all of Nims DS are well-crafted and 
very-performant - esp. the `std/tables` is a performance-beast :) BUT as i type 
along on my Intel i5 (2014) with 4-HW-threads, i'm sure my next machine will be 
either a M-something, RISC-V or a modern Intel/AMD-thingy and both will provide 
20-cores or more. And in that respect i'd like to see Nim 2.0 beeing able to 
provide data-structures that can be safely and efficiently used in a lock-free 
manner by multiple threads. So the pole-position on my wish-list for 2.0 is a 
new lib/module, filled with concurrent-DS for `Seq/Table/Set/List`, that are 
(as-much as possible) API-compatible with the current 
`std-Seq/Set/Table`-implementations. So that current projects might only need 
to exchange one import-line and start to benefit from better thruput or can 
exchange `std/shared`-types. New projects would naturally base their code on 
the concurrent-DS-types, without worrying about memory-reclamation-strategies. 
I'm sure this will provoke mentions on HN, along the lines _Look Mama 'Nim 
adopts lock-free DS.'_ I'd like to publish my tiny collection of 
'milestone'-papers and make them available to evbd. interested. Maybe at the 
Nim-Wiki, or linked from my github or nice&shiny within [ <http://Markdown.com> 
] or elsewhere - i'm open for suggestions. Many times i came across warnings 
saying :

_" In mutlithreading and esp. threadsafe DS, even the most experienced hackers 
have very-good chances to get it wrong."_.

Besides the fact-of-life _" Failure is always a option"_ yes - that might be 
true for the crafting of **multithreaded lock-based** DS like trees. Looking at 
Harris Linked-List or Shavit/Shalevs new (amazingly-brilliant) concepts i'd say 
these are stand-out because of their simple design. Beeing a rather mediocre 
hacker, i regard these as doable. The problem of 'safe-memory-reclamation' is 
still a hot - or at least warm - topic in research. More recent works like 
Interval-based-Reclamation (IBR/2018) or Variant-based-Reclamation (VBR/2021) 
are RCU-flavours with sophisticated additions. They seem to be very promising 
and my impression is, that the implementations are strictly NOT rocket-science. 
Again, things can be studied and tried publicly using FB 'Folly' or DPDK-RCU. 
ARAQ once mentioned in one of his musings that he believes either STM or 
Memory-Pools might be a good fit for Nim in the future - thus further taking 
away workloads from Nims-GC's. The VBR-paper from 2018 proposes exactly that - 
they test the use of _type-preserving_ Memory-Pools. Their performance results 
are quite good. Having that said - i don't expect that any form of 
language-support is needed here - rather the opposite. On the other hand, maybe 
some-day-in-the-future it turns out that the Nim-runtime has more oversight on 
a Apps usage of type-preserving-pools. Maybe one lets the VM decide or suggest 
upon efficient dimensions for such pools. Or in case the VM detects the use of 
a 'non-threadsafe-DS' during compilation and decides to abort or warn the user. 
The basic node-structures of e.g. Harris-Linked-Lists and the proposed 
HashMap/Sets by Shavit/Shalev are identical, and thus can be consumed and 
recycled from/to the same pool. Maybe the VM can handle/reason differently on 
concurrent-DS on the shared-heap ? For new users of Nim such a introduction 
might be a attractive feat and i find that a Version 2.0 deserves better than 
(retiring stuff and cleaning-up the std-lib), instead a feature that increases 
efficiency and will clearly go mainstream soon (within the next 2-4 yrs.) and 
that surprisingly 'the other crews' ( Odin/Zig/Hare/Go/V(sry, i had to)) do not 
provide yet - can't say if the debate it. As a side-note: The more pseudo-code 
i study from these papers, the more i assume one could easily provide 
real-nim-code that is closer-or-maybe-identical to the provided code-samples. 
So finally part of this is about style and ease-of-use. A good example is Nims 
`withLock`-template. It prevents errors (aquired but release forgotten), marks 
and indents the critical-section and looks good. In my RCU-wrap, you mark & 
indent the critical-section exactly so and understanding of what happens behind 
the scene on entering and leaving is not required. But just-in-case one 
attempts to call `synchronize_rcu` within a critical-section, that will 
not-compile and instead abort with a understandable error-explanation. I 
believe this reads well and provides for ease-of-use and safety.

I considered writing a RFC on this topic, but since it _does and shall not_ 
demand changes to the language i decided to ask for some feedback upfront.

So far i've tried :

  * wrapped Userspace-RCU for Nim ( *nix-only )
  * implemented a HAMT, that the authors claimed to have 80-percent the thruput 
of a Hash-Map. Learned than that such a complex-structure remains 20-% 
performance - maybe its all different in Clojure-land or maybe its just me.. 
Still believe a HAMT can be made lock-free and threadsafe.
  * found Folly ( C++, all platforms ), sry, too dumb, cannot wrap that.
  * tried to make a wrap out of DPDK-RCU ( *nix & win) - does not compile on 
osx.
  * studied and started implementing Harris Linked-List and Shavit/Shalev 
Set/HashMap
  * collected, read and picked papers following the topic 'safe memory 
reclamation'
  * tried my best to identify new ideas&concepts that might be the right 
choices for Nim 2.0



So i'd love to hear your ideas, critique or warnings maybe regarding possible 
conflicts with other planned-features alike CSP/Weave.

beatz & greetz, Andreas

Back to the basics (of research) - Concurrent data-structures in 2.0 ?

Reply via email to