use `openArray[T]` then.

> Also I want to have max performance, and one extra proc call is not ideal.

If your public proc is tagged `inline` there is no extra cost.

The tradeoffs are:

  * if you use openArray, both static/dynamic will use the same code
  * if you don't, you generate twice more code
  * if you don't use openArray, the compiler has an easier time for loop 
unrolling
  * if you do, it will only be able to unroll loops if you tag everything inline



Furthermore all the procedures for bit vectors are very small so tagging all 
inline makes sense.

Lastly, a division or modulo operation takes about 55 cycles. An 
addition/shift/and instruction takes 1 cycle at most. Since everything you 
divide or modulo with is a power of 2 use `shr log2(n)` for division and `and 
(n-1)` for modulo.

As mentioned on Discord, use ceil_division for sizing your bitvector:
    
    
    proc ceilDiv(a, b: int) =
      (a+b-1) div b
    
    
    Run

You can adapt it to b being a power of 2

Reply via email to