Atomic ARC option?

cblake Sun, 27 Nov 2022 03:45:10 -0800

> Some applications, I am thinking of games specifically, have a lot of 
> internal state that is written once at startup - ... - and never again 
> modified.


I realize the conversation has moved on, but I did not see this question 
answered. This is just an informational post - I know all the senior sys 
programmers here like @Araq & @boia01 know it all already. What @Yepoleb 
describes would be "read-only after creation state" not what most people mean 
by "shared mutable state". For writable memory pages, state almost always 
starts as all zeros and must get populated _somehow_ , but if your application 
logic ensures it is not written to (say with Nim's `let`) then there is little 
need to "protect" it from multiple concurrent writes - only to defend against 
buggy logic. For simple enough memory page structures, things like 
[mprotect](https://man7.org/linux/man-pages/man2/mprotect.2.html) or whatever 
the Windows equivalent is can activate CPU MMU protection after the fact.

But if you are willing to assume your CPU has an MMU (usually but not 
universally true - e.g. some embedded CPU manufacturers scrimp on that) then a 
nicer design for the "much static data" scenario can be pre-compute binary file 
formats and use `std/memfiles` to just memory map them READ-ONLY. This enforces 
the non-modified invariant with hardware (as well as being essentially zero 
cost "parsing" at program start-up - really the CPU understands and an "parse" 
the native format). In a sense, writing these binary formats is a bit like 
writing a "compiler for data" while re-parsing it all the time is "staying 
interpreted".

A simple fully worked out example of this is 
[thes](https://github.com/c-blake/thes). A more complex one is 
[suggest](https://github.com/c-blake/suggest) where one pre-computes 100s of 
MiBs of data. A more "general purpose" one kind of targeted at "data analysis" 
of "FileArrays" is [nio](https://github.com/c-blake/nio). None of those is 
really "multi-threaded" in the sense of this conversation. There is also 
[ftab](https://github.com/c-blake/ftab) which is a single-writer multi-reader 
"persistent" data structure { **_not_** the same kind of "persistence" @boia01 
means - but "external file persistence" }. That last is more a WIP - still 
needs checksums so parallel readers can verify the single writer did not alter 
what they wanted under the covers (as well as being nice for crash recovery), 
but it may inspire/serve as food for thought { and has important bugfixes to 
`std/memfiles` on Windows I have been meaning to upstream as a Nim PR }.

A final example that does _not_ involve such a "data compiler", if that feels 
intimidating/complex, is `grl.nim 
<https://github.com/c-blake/cligen/blob/master/examples/grl.nim>` where you 
have a Nim `seq[string]` pre-fork and then `procpool` forks a bunch of kids to 
do work. Each kid gets a copy-on-write version of the Nim `paths: seq[string]` 
and just accesses the memory "as usual". This `grl` program can be faster (or 
slower!) than a threaded device IO equivalent because `memfiles` IO can be [for 
reasons](https://sasha-f.medium.com/why-mmap-is-faster-than-system-calls-24718e75ab37),
 but maybe only faster for processes which have _private page tables_. Threads 
sharing page tables must synchronize _that_ shared mutable state of the file 
mappings in-OS-kernel. For many small files mostly in the buffer cache (highly 
dynamic file mappings), all that sync will definitely slow down a similar 
threads version.

In short, as always, also consider processes for parallelism/multi-core no 
matter how good Nim support for threads sharing general object graphs becomes. 
{ I sure hope it becomes second to none! :-) } Depending upon your application 
structure/needs, it may be simpler "in the large" with added bonuses of 
avoiding re-computation or (possibly) performance.

Atomic ARC option?

Reply via email to