Isn't tuple assignment by copy? That's probably the difference.
The optimization that Nim can make is to pass the function _arguments_ by reference. Maybe try an example which passes the result of `p1()` directly to a function that takes a tuple. I suspect there would be no runtime difference in that case.
