You know, for the vast majority of these cases, I just use MVCL, or sometimes a simple macro that encapsulates one with a little help on the register setup and padding.
Yes, of course MVCL takes a while to get itself cranked up, and where that crossover lies with respect to a loop, unrolled or not, of MVCs, varies by machine type. But in the crunch it is so rare that I am writing time critical code that the power, simplicity, and elegance of MVCL trumps all. It is even rarer that I need to worry about moves where it is unknown at compile time that the length may exceed 16MB. When building a print or WTO line from data components, MVCL instead of MVC even for a few short pieces is very unlikely to be a big contributor to a product's overall CPU consumption. Tony H.
