Honestly, GCs problematics are overrated. Current default GC works really well and I have very good performance with it.
I'd say, the only time you need to worry about GC is when working on embedded with hard real time requirement. As @Vindaar pointed out there is a blocking PR to allow arc, orc to work with Arraymancer, and using Tensor from numpy array with Nimpy without overhead. That said, if you work with fixed dimensions, you can pre-allocate your Tensor and reuse the same every time. It will "only" cost a memcpy to convert a PyBuffer to the Tensor.
