Along the lines of what @mratsim suggests you could also speed up iteration
over sparsely populated bitsets with something called an [Aggregate Bit
Vector](http://cseweb.ucsd.edu/~varghese/PAPERS/icnp2002.pdf). The core idea is
just for each block of 64-bits in the lower level bit vector, if any bit is 1,
you set the bit in the index to be 1. Then you can skip all zero index values
64x faster. You can iterate the design with another index level to iterate
4096x faster. More than 2 index levels would not make any sense for 65536-slot
limited Nim `set[T]`. On each incl/excl you would need to do 2..3 sets, though.