If timings are dependent from the order placement within the block of tests, I guess it reveals that timings are heavily impacted by branch prediction.
Looking at these new timings, it seems though that gcdSub, while not always the optimal one, is always a good option to take and performs much better than the actual GCD. GMP performs multi-precision arithmetic so different algorithms are involved.