--- Comment #5 from Alexander Nesterovskiy <alexander.nesterovskiy at intel dot 
com> ---
Yes, looks like the problem is with unaligned access (there is no fail in
reproducer when starting a loop with i=0).
It seems that your patch works - there are no runfails for reproducer, 445,
521, 527, 554 (tested on SPEC train workload).
I'll report upon finishing other benchmarks.

