I think you're currently a bit of a trailblazer on this front (multithreaded + async has not been done much that I have seen). It's great as a project and can really pay off for the effort, but your facing the pain of taking a path that isn't cleared by others yet.
I suggest breaking this in to parts and winning battle-by-battle. You are perhaps currently focused on a few too many fronts and it probably feels overwhelming but I'm just guessing. I suggest starting with a focused and small async-APIs only server that just sends a static response back. Similar to what you have but no framework, just the very very basics. Then do a bench with this in one thread vs many threads (with refc vs arc vs orca as easy flags). This will serve the purpose of benchmarking the performance "under you". As in what is invariant to your code. This may be enough to identify interesting performance behavior or stability issues that can be addressed. When fixing bugs I can speak from experience that an issue that can only be reproduced in someone else's large project really mixes "what is my problem vs what is their problem" and when the Nim team has tons of people reporting stuff all the time it becomes easy to ignore these issues because they will take too much time. Thus a focused thing can go a long long way. Regarding the VTune / uProf stuff, that is not super complex and you'll get the hang of it quite quickly after some initial annoyance with the bad UI and settings stuff. The payoff can be huge though, I suggest going for this again with simple programs first to get a sense for what you are looking at (--debugger:native flag don't forget that to get useful results). These kinds of things seem like a good next step for me but I obv do not know the entire state of what you're working on etc. Unfortunately you are limited in the ability to make progress beyond where you are until these underlying issues get answers so I suggest just embracing that this is a blocker and identifying the problems asap.