(5) Do we agree that all such cpus use a byte-granular modification mask?
Now, as of (0) I might agree to disregard the original Alpha, but as the embedded world moves to SMP I'm not sure we can disregard non-cache coherent NUMA setups or even CPUs without a byte store.
As per 5, it doesn't matter if the CPU lacks a byte store, since the cache has a byte-granular modification mask.