I have been chasing memory offlining not making progress recently. On the way I have noticed few weird decisions in the code. The migration itself is restricted without a reasonable justification and the retry loop around the migration is quite messy. This is addressed by patch 1 and patch 2.
Patch 3 is targeting on the faultaround code which has been a hot candidate for the initial issue reported upstream  and that I am debugging internally. It turned out to be not the main contributor in the end but I believe we should address it regardless. See the patch description for more details.  http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv