Of course not, there is much allocation work involved. But about 500 ns (a few
thousand CPU cycles) is indeed more as my expectations. I discovered that when
tuning my chess engine -- using one more tiny seq noticeable reduced
performance. Of course seq's are convenient, because of existing add()
operation. A plain array allocated on the stack is faster, fixed size is OK,
because I have to store only the moves or captures for current position, but I
would have to track current add position. Chess is recursive, so I can not use
one single global seq. My current idea is to use a global seq as as first
buffer for collecting captures, and only swap() that buffer with a new
allocated seq when it turns out that there are possible captures at all
available. (I guess swap will not copy data, but only pointers, so it may be
fast) Or I may use my own defined container which is allocated on the stack.
For that case I would have to define my own add() and sort() operations. I
don't think that I will get trouble with stack size...
import times, random
type
MyBuffer = tuple
d: array[128, int]
len: int
proc test(): int =
var s = newSeqOfCap[int](128)
s.len
proc tx(): int =
var s: MyBuffer
s.len
proc t0(): int =
random(7)
proc main =
var t: float # cpuTime()
var i: int
t = cpuTime()
i = test()
i = test()
i = test()
i = test()
i = test()
i = test()
i = test()
i = test()
i = test()
i = test()
echo "10 * test: ", cpuTime() - t
t = cpuTime()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
i = tx()
echo "10 * tx: ", cpuTime() - t
t = cpuTime()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
i = t0()
echo "10 * t0: ", cpuTime() - t
main()
# nim c -d:release t.nim
# 10 * test: 5.000000000000013e-06
# 10 * tx: 0.0
# 10 * t0: 0.0