> I think Tony's in the right direction. We already do dst "sizing" like > that for the compiler in clwb().
The clwb case does look like what we want for movdir64b(). But is it right for clwb() ... that doesn't modify anything, just pushes things from cache to memory. So why is it using "+m"? -Tony

