@Monster to avoid globals, simply go for a root structure/object and pass the 
pointers around

to your second question:

How atomic is implemented depends completeley on your architecture. Thats the 
reason why you have the c standards; each Program should behave correct 
regardless if its compiled for PowerPC, ADSP-XXXXX or MIPS for instance. if you 
have a MC (for instance the 6502; I recommend to start with that if you like to 
go for basic research) who likes to write to memory per statement 8bit can be 
written. 16bit needs at least 2 asm-statements. You are single core here but 2 
statements could be interrupted. And if you have a dma controller on the same 
bus, things could be worse because a single write could also be delayed. Thats 
the reason why some vendors have special, atomic, instructions. TAS (test and 
set) or CAS (compare and set) which behave atomic on the databus.

Due to the physical fact that external memory is very slow clocked, today there 
are memory caches. With multicore and caches things get complicated. Each core 
has its cache but needs to read/write to external memory. The access needs to 
synchonized/serialized but I dont know how its done on the 0x86 architecture. 
Most of the technology is not open source (hidden undocumented instructions 
possible). Also on modern SOC (GPU or Baseband chips) there is also running a 
proprietary RTOS behind the scenes. At least for two physical chips the VHDL is 
open, its the P8X32A from parallax (crazy non mainstream) and the risc-v core ( 
[https://www.sifive.com/products/hifive1](https://www.sifive.com/products/hifive1)
 ). The Sitara TI family is not open but all datasheets are public (Beaglebone 
for instance). They are linux ready and nim should also run on it (not tested 
yet).

Its always the same: tradeoff between IO/Bandwidth and CPU and dealing with 
contention regardless if your system is on singlecore, multicore, database or 
you do something distributed.

Due to the fact that you have multicore and for instance your architektural 
design lacks a little bit it could be possible that you go for parallel but 
your code is much slower (or not faster) than the single thread version.

So if you have a 32-bit arch and you like to do atomics with 64bit you have to 
look into your C api. If it´s there your are lucky if not you have to implement 
it for your own (Locks).

Reply via email to