A couple of notes:

  * Your stack size of 10_000_000 is very big. That would only work with global 
arrays as those are statically assigned a size at program start. However if 
used within a function you will most certainly stack overflow. For example a 
stack of int would take 80_000_000 bytes = 80MB but max stack size is about 8MB 
on most machines. If this is used within a function you will have to allcoate 
it on the heap with a ref array or just use a seq.
  * In terms of speed, if the address of the seq and the address of the arrays 
are hot in cache, there is no difference. Data structures on the stack are 
always hot in cache within a function.
  * The `base_addr_offset + index * size` computation to address each elements 
of a seq/array doesn't matter in general for 2 reasons:
    
    * The size of an object is power of 2 (due to padding and alignment) and 
x86 can do offset + index * pow2 addressing in a single instruction without a 
multiplication via something called SIB addressing (Scaled Index Byte) for 
types of size 1, 2, 4 or 8: 
[https://wiki.osdev.org/X86-64_Instruction_Encoding#SIB](https://wiki.osdev.org/X86-64_Instruction_Encoding#SIB)
    * It is very likely that the bottleneck is either memory or branch 
predictions and that your CPU has a lot of free time to do those computations 
in between waiting for data from the L1/L2 cache.


Reply via email to