subject:"Re\: \[Qemu\-devel\] \[RFC 00\/13\] Live memory snapshot based on userfaultfd"

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-28 Thread Hailiang Zhang


On 2017/3/1 0:14, Andrea Arcangeli wrote:

Hello,

On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote:

Yes, for current implementing of live snapshot, it supports tcg,
but does not support kvm mode, the reason i have mentioned above,
if you try to implement it, i think you need to start from userfaultfd
supporting KVM. There is scenario for it, But I'm blocked by other things
these days. I'm glad to discuss with you if you planed to do it.


Yes, there were other urgent userfaultfd features needed by QEMU and
CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and
they're all included upstream now. Now that such work is finished,
fixing the WP support to work with KVM and to provide full accuracy
will be the next thing to do.



Great, looking forward to it. thanks.


Thanks,
Andrea

.

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-28 Thread Andrea Arcangeli

Hello,

On Tue, Feb 28, 2017 at 09:48:26AM +0800, Hailiang Zhang wrote:
> Yes, for current implementing of live snapshot, it supports tcg,
> but does not support kvm mode, the reason i have mentioned above,
> if you try to implement it, i think you need to start from userfaultfd
> supporting KVM. There is scenario for it, But I'm blocked by other things
> these days. I'm glad to discuss with you if you planed to do it.

Yes, there were other urgent userfaultfd features needed by QEMU and
CRIU queued for merging (hugetlbfs/shmem/non-cooperative support) and
they're all included upstream now. Now that such work is finished,
fixing the WP support to work with KVM and to provide full accuracy
will be the next thing to do.

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-28 Thread Christian Pinto

Thanks a lot Hailiang

On 28/02/2017 02:48, Hailiang Zhang wrote:

Hi,

On 2017/2/27 23:37, Christian Pinto wrote:

Hello Hailiang,

are there any updates on this patch series? Are you planning to release
a new version?

No, userfaultfd still does not support write-protect for KVM.
You can see the newest discussion about it here:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html

Yes, I have read that part of the discussion and quickly managed to
reproduce the "Bad address" on ARMv8.

You say there are some issues with the current snapshot-v2 version,
which issues were you referring to? On my side the only problem I have
seen was that the live snapshot was not working on ARMv8, but I have
fixed that and managed to successfully snapshot and restore a QEMU ARMv8
tcg machine on an ARMv8 host. I will gladly contribute with these fixes
once you will release a new version of the patches.

Yes, for current implementing of live snapshot, it supports tcg,
but does not support kvm mode, the reason i have mentioned above,
if you try to implement it, i think you need to start from userfaultfd
supporting KVM. There is scenario for it, But I'm blocked by other things
these days. I'm glad to discuss with you if you planed to do it.

I will have a deeper look at why userfault is not yet working with KVM
and get back on this thread for feedback/suggestions.

Thanks,
Christian

Thanks.
Hailiang

Thanks a lot,

Christian

On 20/08/2016 08:31, Hailiang Zhang wrote:

Hi,

I updated this series, but didn't post it, because there are some
problems while i tested the snapshot function.
I didn't know if it is the userfaultfd issue or not.
I don't have time to investigate it this month. I have put them in
github

https://github.com/coloft/qemu/tree/snapshot-v2

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I didn't have time to test KVM live memory snapshot on it yet as I'm
still working to improve it. Did anybody test it? However I'd be happy
to take any bugreports and quickly solve anything that isn't working
right with the shadow MMU.

I got positive report already for another usage of the uffd WP
support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

2) to avoid all false positives the equivalent of pte_mksoft_dirty is
needed too... and that requires spare software bits on the pte
which are available on x86. I considered also taking over the
soft_dirty bit but then you couldn't do checkpoint restore of a
JIT/to-native compiler that uses uffd WP support so it wasn't
ideal. Perhaps it would be ok as an incremental patch to make the
two options mutually exclusive to defer the arch changes that
pte_mkuffd_wp would require for later.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

In theory things should work ok already if the userland code is
tolerant against false positives through swap and after fork() and
KSM. For an usage like snapshotting false positives shouldn't be an
issue (it'll just run slower if you swap in the worst case), and point
3) above also isn't an issue because it's going to register into uffd
with WP only.

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-27 Thread Hailiang Zhang

Hi,

On 2017/2/27 23:37, Christian Pinto wrote:

Hello Hailiang,

are there any updates on this patch series? Are you planning to release
a new version?

No, userfaultfd still does not support write-protect for KVM.
You can see the newest discussion about it here:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html

Thanks.
Hailiang

Thanks a lot,

Christian

On 20/08/2016 08:31, Hailiang Zhang wrote:

Hi,

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2017-02-27 Thread Christian Pinto

Hello Hailiang,

are there any updates on this patch series? Are you planning to release
a new version?

Thanks a lot,

Christian

On 20/08/2016 08:31, Hailiang Zhang wrote:

Hi,

I updated this series, but didn't post it, because there are some
problems while i tested the snapshot function.

I didn't know if it is the userfaultfd issue or not.
I don't have time to investigate it this month. I have put them in github
https://github.com/coloft/qemu/tree/snapshot-v2

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-12-08 Thread Hailiang Zhang


Hi Andrea,

I noticed that, you call change_protection() helper in mprotect to realize
write protect capability for userfault. But i doubted mprotect can't work
properly with KVM. If shadow page table (spte) which used by VM is already
established in EPT,change_protection() does not remove its write authority
but only invalid its Host page-table and shadow page table ( kvm registers
invalidate_page/invalidate_range_start).

I investigated ksm, Since it can merge the pages which are used by VM,
and it need to remove the write authority of these pages too.
Its process is not same with mprotect. It has a helper write_protect_page(),
and it finally calls hook function change_pte in KVM. It will remove the page's
write authority in EPT page table.

The code path is:
write_protect_page
-> set_pte_at_notify
   -> mmu_notifier_change_pte
  -> mn->ops->change_pte
 -> kvm_mmu_notifier_change_pte

(If I'm wrong, please let me know :) ).

So IMHO, we can realize userfault supporting KVM by refer to ksm,
I will investigate it deeply and try to implement it,
but I'm not quite familiar with memory system in kernel,
so it will takes me some time to study it firstly ...
I'd like to know if you have any plan about supporting KVM for userfault ?

Thanks,
Hailiang

On 2016/9/18 10:14, Hailiang Zhang wrote:

Hi Andrea,

Any comments ?

Thanks.

On 2016/9/6 11:39, Hailiang Zhang wrote:

Hi Andrea，

I tested it with the new live memory snapshot with --enable-kvm, it doesn't 
work.

To make things simple, I simplified the codes, only left the codes that can 
tested
the write-protect capability. You can find the codes from
https://github.com/coloft/qemu/tree/test-userfault-write-protect.
You can reproduce the problem easily with it.

Tested result as follow,
[root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive 
file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none 
 -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
--monitor stdio
QEMU 2.6.95 monitor - type 'help' for more information
(qemu) migrate file:/home/xxx
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
error: kvm run failed Bad address
EAX=0004 EBX= ECX=83b2ac20 EDX=c022
ESI=85fe33f4 EDI=c020 EBP=83b2abcc ESP=83b2abc0
EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0023   00c0f300 DPL=3 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0023   00c0f300 DPL=3 DS   [-WA]
FS =0030 83b2dc00 3748 00409300 DPL=0 DS   [-WA]
GS =   
LDT=   
TR =0028 801e2000 20ab 8b00 DPL=0 TSS32-busy
GDT= 80b95000 03ff
IDT= 80b95400 07ff
CR0=8001003b CR2=030b5000 CR3=00185000 CR4=06f8
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=0800
Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a 
d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b

I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM 
with the Linux
Memory Management.

Here for userfault write-protect, the function calling paths are:
userfaultfd_ioctl

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-09-17 Thread Hailiang Zhang


Hi Andrea,

Any comments ?

Thanks.

On 2016/9/6 11:39, Hailiang Zhang wrote:

Hi Andrea，

I tested it with the new live memory snapshot with --enable-kvm, it doesn't 
work.

To make things simple, I simplified the codes, only left the codes that can 
tested
the write-protect capability. You can find the codes from
https://github.com/coloft/qemu/tree/test-userfault-write-protect.
You can reproduce the problem easily with it.

Tested result as follow,
[root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive 
file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none 
 -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
--monitor stdio
QEMU 2.6.95 monitor - type 'help' for more information
(qemu) migrate file:/home/xxx
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
error: kvm run failed Bad address
EAX=0004 EBX= ECX=83b2ac20 EDX=c022
ESI=85fe33f4 EDI=c020 EBP=83b2abcc ESP=83b2abc0
EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0023   00c0f300 DPL=3 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0023   00c0f300 DPL=3 DS   [-WA]
FS =0030 83b2dc00 3748 00409300 DPL=0 DS   [-WA]
GS =   
LDT=   
TR =0028 801e2000 20ab 8b00 DPL=0 TSS32-busy
GDT= 80b95000 03ff
IDT= 80b95400 07ff
CR0=8001003b CR2=030b5000 CR3=00185000 CR4=06f8
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=0800
Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a 
d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b

I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM 
with the Linux
Memory Management.

Here for userfault write-protect, the function calling paths are:
userfaultfd_ioctl
-> userfaultfd_writeprotect
  -> mwriteprotect_range
-> change_protection (Directly call mprotect helper here)
  -> change_protection_range
-> change_pud_range
  -> change_pmd_range
 -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
-> kvm_mmu_notifier_invalidate_range_start (KVM module)
OK, here, we remove the item from spte. (If we use EPT hardware, we remove
the page table entry for it).
That's why we can get fault notifying for VM.
And It seems that we can't fix the userfault (remove the page's write-protect 
authority)
by this function calling paths.

Here my question is, for userfault write-protect capability, why we remove the 
page table
entry instead of marking it as read-only.
Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do 
this,
We can use it to remove the writable authority for KVM page table, just like 
KVM dirty log tracking
does. Please see function __rmap_write_protect() in KVM.

Another question, is mprotect() works normally with KVM ? (I didn't test it.), 
I think
KSM and swap can work with KVM properly.

Besides, there seems to be a bug for userfault write-protect.
We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
UFFDIO_WRITEPROTECT_MODE_DONTWAKE

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-09-05 Thread Hailiang Zhang


Hi Andrea，

I tested it with the new live memory snapshot with --enable-kvm, it doesn't 
work.

To make things simple, I simplified the codes, only left the codes that can 
tested
the write-protect capability. You can find the codes from
https://github.com/coloft/qemu/tree/test-userfault-write-protect.
You can reproduce the problem easily with it.

Tested result as follow,
[root@localhost qemu]# x86_64-softmmu/qemu-system-x86_64 --enable-kvm -drive 
file=/mnt/sdb/win7/win7.qcow2,if=none,id=drive-ide0-0-1,format=qcow2,cache=none 
 -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
--monitor stdio
QEMU 2.6.95 monitor - type 'help' for more information
(qemu) migrate file:/home/xxx
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
qemu-system-x86_64: postcopy_ram_fault_thread: 7f07fb92a000 fault and remove 
write protect!
error: kvm run failed Bad address
EAX=0004 EBX= ECX=83b2ac20 EDX=c022
ESI=85fe33f4 EDI=c020 EBP=83b2abcc ESP=83b2abc0
EIP=8bd2ff0c EFL=00010293 [--S-A-C] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0023   00c0f300 DPL=3 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0023   00c0f300 DPL=3 DS   [-WA]
FS =0030 83b2dc00 3748 00409300 DPL=0 DS   [-WA]
GS =   
LDT=   
TR =0028 801e2000 20ab 8b00 DPL=0 TSS32-busy
GDT= 80b95000 03ff
IDT= 80b95400 07ff
CR0=8001003b CR2=030b5000 CR3=00185000 CR4=06f8
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=0800
Code=8b ff 55 8b ec 53 56 8b 75 08 57 8b 7e 34 56 e8 30 f7 ff ff <6a> 00 57 8a 
d8 e8 96 14 00 00 6a 04 83 c7 02 57 e8 8b 14 00 00 5f c6 46 5b 00 5e 8a c3 5b

I investigated kvm and userfault codes. we use MMU Notifier to integrating KVM 
with the Linux
Memory Management.

Here for userfault write-protect, the function calling paths are:
userfaultfd_ioctl
  -> userfaultfd_writeprotect
-> mwriteprotect_range
  -> change_protection (Directly call mprotect helper here)
-> change_protection_range
  -> change_pud_range
-> change_pmd_range
   -> mmu_notifier_invalidate_range_start(mm, mni_start, end);
  -> kvm_mmu_notifier_invalidate_range_start (KVM module)
OK, here, we remove the item from spte. (If we use EPT hardware, we remove
the page table entry for it).
That's why we can get fault notifying for VM.
And It seems that we can't fix the userfault (remove the page's write-protect 
authority)
by this function calling paths.

Here my question is, for userfault write-protect capability, why we remove the 
page table
entry instead of marking it as read-only.
Actually, for KVM, we have a mmu notifier (kvm_mmu_notifier_change_pte) to do 
this,
We can use it to remove the writable authority for KVM page table, just like 
KVM dirty log tracking
does. Please see function __rmap_write_protect() in KVM.

Another question, is mprotect() works normally with KVM ? (I didn't test it.), 
I think
KSM and swap can work with KVM properly.

Besides, there seems to be a bug for userfault write-protect.
We use UFFDIO_COPY_MODE_DONTWAKE in userfaultfd_writeprotect, should it be
UFFDIO_WRITEPROTECT_MODE_DONTWAKE there ?

static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-08-20 Thread Hailiang Zhang

Hi,

I updated this series, but didn't post it, because there are some problems
while i tested the snapshot function.
I didn't know if it is the userfaultfd issue or not.
I don't have time to investigate it this month. I have put them in github
https://github.com/coloft/qemu/tree/snapshot-v2

Anyone who want to test and modify it are welcomed!

Besides, will you join the linuxcon or KVM forum in Canada ?
I wish to see you there if you join the conference ;)

Thanks,
Hailiang

On 2016/8/18 23:56, Andrea Arcangeli wrote:

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-08-18 Thread Andrea Arcangeli

Hello everyone,

I've an aa.git tree uptodate on the master & userfault branch (master
includes other pending VM stuff, userfault branch only contains
userfault enhancements):

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

I got positive report already for another usage of the uffd WP support:

https://medium.com/@MartinCracauer/generational-garbage-collection-write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f

The last few things I'm working on to finish the WP support are:

1) pte_swp_mkuffd_wp equivalent of pte_swp_mksoft_dirty to mark in a
vma->vm_flags with VM_UFFD_WP set, which swap entries were
generated while the pte was wrprotected.

3) prevent UFFDIO_ZEROPAGE if registering WP|MISSING or trigger a
cow in userfaultfd_writeprotect.

4) WP selftest

The current status includes:

1) WP support for anon (with false positives.. work in progress)

2) MISSING support for tmpfs and hugetlbfs

3) non cooperative support

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-19 Thread Hailiang Zhang


On 2016/7/14 19:43, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

For now, we still didn't support live memory snapshot, we have discussed
a scheme which based on userfaultfd long time ago.
You can find the discussion by the follow link:
https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html


I've (finally!) had a brief look through this, I like the idea.
I've not bothered with minor cleanup like comments on them;
I'm sure those will happen later; some larger scale things to think
about are:
a) I wonder if it's really best to put that much code into the postcopy
   function; it might be but I can see other userfault uses as well.


Yes, it is better to extract common codes into public functions.


b) I worry a bit about the size of the copies you create during setup
   and I don't really understand why you can't start sending those pages


Because we save device state and ram in the same snapshot_thread, if the process
of saving device is blocked by writing pages, we can remove the write-protect in
'postcopy/fault' thread, but can't send it immediately.


Don't you write the devices to a buffer? If so then you perhaps you could split
writing into that buffer into a separate thread.



Hmm, it may work in this way.


   immediately - but then I worry aobut the relative order of when pages
   data should be sent compared to the state of devices view of RAM.
c) Have you considered also using userfault for loading the snapshot - I
  know there was someone on #qemu a while ago who was talking about using
  it as a way to quickly reload from a migration image.



I didn't notice such talking before, maybe i missed it.
Could you please send me the link ?


I don't think there's any public docs about it; this was a conversation
with Christoph Seifert on #qemu about May last year.



Got it.


But i do consider the scenario of quickly snapshot restoring.
And the difficulty here is how can we quickly find the position
of the special page. That is, while VM is accessing one page, we
need to find its position in snapshot file and read it into memory.
Consider the compatibility, we hope we can still re-use all migration
capabilities.

My rough idea about the scenario is:
1. Use an array to recode the beginning position of all VM's pages.
Use the offset as the index for the array, just like migration bitmaps.
2. Save the data of the array into another file in a special format.
3. Also record the position of device state data in snapshot file.
(Or we can put the device state data at the head of snapshot file)
4. While restore the snapshot, reload the array first, and then read
the device state.
5. Set all pages to MISS status.
6. Resume VM to run
7. The next process is like how postcopy incoming does.

I'm not sure if this scenario is practicable or not. We need further
discussion. :)


Yes;  I can think of a few different ways to do (2):
   a) We could just store it at the end of the snapshot file (and know that
it's at the end - I think the json format description did a similar trick).


Yes, this is a better idea.


   b) We wouldn't need the 4 byte headers on the page we currently send.
   c) Juan's idea of having multiple fd's for migration streams might also fit,
  with the RAM data in the separate file.
   d) But if we know it's a file (not a network stream) then should we treat it
  specially and just use a sparse file of the same size as RAM, and just
  pwrite() the data into the right offset?



Yes, this is the simplest way to save the snapshot file, the disadvantage for
it is we can't directly reuse current migration incoming way to restore VM (None
quickly restore). We need to modify current restore process. I'm not sure which
way is better. But it's worth a try.

Hailiang


Dave



Hailiang


Dave



The process of this live memory scheme is like bellow:
1. Pause VM
2. Enable write-protect fault notification by using userfaultfd to
 mark VM's memory to write-protect (readonly).
3. Save VM's static state (here is device state) to snapshot file
4. Resume VM, VM is going to run.
5. Snapshot thread begins to save VM's live state (here is RAM) into
 snapshot file.
6. During this time, all the actions of writing VM's memory will be blocked
by kernel, and kernel will wakeup the fault treating thread in qemu to
process this write-protect fault. The fault treating thread will deliver 
this
page's address to snapshot thread.
7. snapshot thread gets this address, save this page into snasphot file,
 and then remove the write-protect by using userfaultfd API, after that,
 the actions of writing will be

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-14 Thread Dr. David Alan Gilbert

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:
> On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:
> > * zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> > > For now, we still didn't support live memory snapshot, we have discussed
> > > a scheme which based on userfaultfd long time ago.
> > > You can find the discussion by the follow link:
> > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
> > > 
> > > The scheme is based on userfaultfd's write-protect capability.
> > > The userfaultfd write protection feature is available here:
> > > http://www.spinics.net/lists/linux-mm/msg97422.html
> > 
> > I've (finally!) had a brief look through this, I like the idea.
> > I've not bothered with minor cleanup like comments on them;
> > I'm sure those will happen later; some larger scale things to think
> > about are:
> >a) I wonder if it's really best to put that much code into the postcopy
> >   function; it might be but I can see other userfault uses as well.
> 
> Yes, it is better to extract common codes into public functions.
> 
> >b) I worry a bit about the size of the copies you create during setup
> >   and I don't really understand why you can't start sending those pages
> 
> Because we save device state and ram in the same snapshot_thread, if the 
> process
> of saving device is blocked by writing pages, we can remove the write-protect 
> in
> 'postcopy/fault' thread, but can't send it immediately.

Don't you write the devices to a buffer? If so then you perhaps you could split
writing into that buffer into a separate thread.

> >   immediately - but then I worry aobut the relative order of when pages
> >   data should be sent compared to the state of devices view of RAM.
> >c) Have you considered also using userfault for loading the snapshot - I
> >  know there was someone on #qemu a while ago who was talking about using
> >  it as a way to quickly reload from a migration image.
> > 
> 
> I didn't notice such talking before, maybe i missed it.
> Could you please send me the link ?

I don't think there's any public docs about it; this was a conversation
with Christoph Seifert on #qemu about May last year.

> But i do consider the scenario of quickly snapshot restoring.
> And the difficulty here is how can we quickly find the position
> of the special page. That is, while VM is accessing one page, we
> need to find its position in snapshot file and read it into memory.
> Consider the compatibility, we hope we can still re-use all migration
> capabilities.
> 
> My rough idea about the scenario is:
> 1. Use an array to recode the beginning position of all VM's pages.
> Use the offset as the index for the array, just like migration bitmaps.
> 2. Save the data of the array into another file in a special format.
> 3. Also record the position of device state data in snapshot file.
> (Or we can put the device state data at the head of snapshot file)
> 4. While restore the snapshot, reload the array first, and then read
> the device state.
> 5. Set all pages to MISS status.
> 6. Resume VM to run
> 7. The next process is like how postcopy incoming does.
> 
> I'm not sure if this scenario is practicable or not. We need further
> discussion. :)

Yes;  I can think of a few different ways to do (2):
  a) We could just store it at the end of the snapshot file (and know that
it's at the end - I think the json format description did a similar trick).
  b) We wouldn't need the 4 byte headers on the page we currently send.
  c) Juan's idea of having multiple fd's for migration streams might also fit,
 with the RAM data in the separate file.
  d) But if we know it's a file (not a network stream) then should we treat it
 specially and just use a sparse file of the same size as RAM, and just
 pwrite() the data into the right offset?

Dave

> 
> Hailiang
> 
> > Dave
> > 
> > > 
> > > The process of this live memory scheme is like bellow:
> > > 1. Pause VM
> > > 2. Enable write-protect fault notification by using userfaultfd to
> > > mark VM's memory to write-protect (readonly).
> > > 3. Save VM's static state (here is device state) to snapshot file
> > > 4. Resume VM, VM is going to run.
> > > 5. Snapshot thread begins to save VM's live state (here is RAM) into
> > > snapshot file.
> > > 6. During this time, all the actions of writing VM's memory will be 
> > > blocked
> > >by kernel, and kernel will wakeup the fault treating thread in qemu to
> > >process this write-protect fault. The fault treating thread will 
> > > deliver this
> > >page's address to snapshot thread.
> > > 7. snapshot thread gets this address, save this page into snasphot file,
> > > and then remove the write-protect by using userfaultfd API, after 
> > > that,
> > > the actions of writing will be recovered.
> > > 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
> > > 
> > > Compared with the feature of 'migrate VM's state to

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-14 Thread Hailiang Zhang

On 2016/7/14 2:02, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

For now, we still didn't support live memory snapshot, we have discussed
a scheme which based on userfaultfd long time ago.
You can find the discussion by the follow link:
https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html

I've (finally!) had a brief look through this, I like the idea.
I've not bothered with minor cleanup like comments on them;
I'm sure those will happen later; some larger scale things to think
about are:
a) I wonder if it's really best to put that much code into the postcopy
function; it might be but I can see other userfault uses as well.

Yes, it is better to extract common codes into public functions.

b) I worry a bit about the size of the copies you create during setup
and I don't really understand why you can't start sending those pages

Because we save device state and ram in the same snapshot_thread, if the process
of saving device is blocked by writing pages, we can remove the write-protect in
'postcopy/fault' thread, but can't send it immediately.

immediately - but then I worry aobut the relative order of when pages
data should be sent compared to the state of devices view of RAM.
c) Have you considered also using userfault for loading the snapshot - I
know there was someone on #qemu a while ago who was talking about using
it as a way to quickly reload from a migration image.

I didn't notice such talking before, maybe i missed it.
Could you please send me the link ?

But i do consider the scenario of quickly snapshot restoring.
And the difficulty here is how can we quickly find the position
of the special page. That is, while VM is accessing one page, we
need to find its position in snapshot file and read it into memory.
Consider the compatibility, we hope we can still re-use all migration
capabilities.

My rough idea about the scenario is:
1. Use an array to recode the beginning position of all VM's pages.
Use the offset as the index for the array, just like migration bitmaps.
2. Save the data of the array into another file in a special format.
3. Also record the position of device state data in snapshot file.
(Or we can put the device state data at the head of snapshot file)
4. While restore the snapshot, reload the array first, and then read
the device state.
5. Set all pages to MISS status.
6. Resume VM to run
7. The next process is like how postcopy incoming does.

I'm not sure if this scenario is practicable or not. We need further
discussion. :)

Hailiang

Dave

The process of this live memory scheme is like bellow:
1. Pause VM
2. Enable write-protect fault notification by using userfaultfd to
mark VM's memory to write-protect (readonly).
3. Save VM's static state (here is device state) to snapshot file
4. Resume VM, VM is going to run.
5. Snapshot thread begins to save VM's live state (here is RAM) into
snapshot file.
6. During this time, all the actions of writing VM's memory will be blocked
by kernel, and kernel will wakeup the fault treating thread in qemu to
process this write-protect fault. The fault treating thread will deliver this
page's address to snapshot thread.
7. snapshot thread gets this address, save this page into snasphot file,
and then remove the write-protect by using userfaultfd API, after that,
the actions of writing will be recovered.
8. Repeat step 5~7 until all VM's memory is saved to snapshot file

Compared with the feature of 'migrate VM's state to file',
the main difference for live memory snapshot is it has little time delay for
catching VM's state. It just captures the VM's state while got users snapshot
command, just like take a photo of VM's state.

For now, we only support tcg accelerator, since userfaultfd is not supporting
tracking write faults for KVM.

Usage:
1. Take a snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off
-drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio
Issue snapshot command:
(qemu)migrate -d file:/home/Snapshot
2. Revert to the snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off
-drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio -incoming file:/home/Snapshot

NOTE:
The userfaultfd write protection feature does not support THP for now,

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-13 Thread Dr. David Alan Gilbert

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:
> For now, we still didn't support live memory snapshot, we have discussed
> a scheme which based on userfaultfd long time ago.
> You can find the discussion by the follow link:
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
> 
> The scheme is based on userfaultfd's write-protect capability.
> The userfaultfd write protection feature is available here:
> http://www.spinics.net/lists/linux-mm/msg97422.html

I've (finally!) had a brief look through this, I like the idea.
I've not bothered with minor cleanup like comments on them;
I'm sure those will happen later; some larger scale things to think
about are:
  a) I wonder if it's really best to put that much code into the postcopy
 function; it might be but I can see other userfault uses as well.
  b) I worry a bit about the size of the copies you create during setup
 and I don't really understand why you can't start sending those pages
 immediately - but then I worry aobut the relative order of when pages
 data should be sent compared to the state of devices view of RAM.
  c) Have you considered also using userfault for loading the snapshot - I
know there was someone on #qemu a while ago who was talking about using
it as a way to quickly reload from a migration image.

Dave

> 
> The process of this live memory scheme is like bellow:
> 1. Pause VM
> 2. Enable write-protect fault notification by using userfaultfd to
>mark VM's memory to write-protect (readonly).
> 3. Save VM's static state (here is device state) to snapshot file
> 4. Resume VM, VM is going to run.
> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>snapshot file.
> 6. During this time, all the actions of writing VM's memory will be blocked
>   by kernel, and kernel will wakeup the fault treating thread in qemu to
>   process this write-protect fault. The fault treating thread will deliver 
> this
>   page's address to snapshot thread.
> 7. snapshot thread gets this address, save this page into snasphot file,
>and then remove the write-protect by using userfaultfd API, after that,
>the actions of writing will be recovered. 
> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
> 
> Compared with the feature of 'migrate VM's state to file',
> the main difference for live memory snapshot is it has little time delay for
> catching VM's state. It just captures the VM's state while got users snapshot
> command, just like take a photo of VM's state.
> 
> For now, we only support tcg accelerator, since userfaultfd is not supporting
> tracking write faults for KVM.
> 
> Usage:
> 1. Take a snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off 
> -drive 
> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
> --monitor stdio
> Issue snapshot command:
> (qemu)migrate -d file:/home/Snapshot
> 2. Revert to the snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off 
> -drive 
> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
> --monitor stdio -incoming file:/home/Snapshot
> 
> NOTE:
> The userfaultfd write protection feature does not support THP for now,
> Before taking snapshot, please disable THP by:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> 
> TODO:
> - Reduce the influence for VM while taking snapshot
> 
> zhanghailiang (13):
>   postcopy/migration: Split fault related state into struct
> UserfaultState
>   migration: Allow the migrate command to work on file: urls
>   migration: Allow -incoming to work on file: urls
>   migration: Create a snapshot thread to realize saving memory snapshot
>   migration: implement initialization work for snapshot
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   savevm: Split qemu_savevm_state_complete_precopy() into two helper
> functions
>   snapshot: Save VM's device state into snapshot file
>   migration/postcopy-ram: fix some helper functions to support
> userfaultfd write-protect
>   snapshot: Enable the write-protect notification capability for VM's
> RAM
>   snapshot/migration: Save VM's RAM into snapshot file
>   migration/ram: Fix some helper functions' parameter to use
> PageSearchStatus
>   snapshot: Remove page's write-protect and copy the content during
> setup stage
> 
>  include/migration/migration.h |  41 +--
>  include/migration/postcopy-ram.h  |   9 +-
>  include/migration/qemu-file.h |   3 +-
>  include/qemu/typedefs.h   |   1 +
>  include/sysemu/sysemu.h

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Andrea Arcangeli

Hello,

On Tue, Jul 05, 2016 at 11:57:31AM +0200, Baptiste Reynal wrote:
> Ok, if it is not on Andrea schedule I am willing to take the action,
> at least for ARM/ARM64 support.

A few days ago I released this update:

https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/

git clone -b master --reference linux
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
cd aa
git fetch
git reset --hard origin/master

The branch will be constantly rebased so you will need to rebase or
reset on origin/master after a fetch to get the updates.

Features added:

1) WP support for anon (Shaohua, hugetlbfs has a FIXME)
2) non cooperative support (Pavel & Mike Rapoport)
3) hugetlbfs missing faults tracking (Mike Kravetz)

WP support and hugetlbfs required a couple of fixes, the
non-cooperative support is as submitted but I wonder if we should have
a single non cooperative feature flag.

I didn't advertise it yet because It's not well tested and in fact I
don't expect the WP mode to work fully as it should.

However the kernel should run stable, I fixed enough bugs so that this
release should not be possible to DoS or exploit the kernel with this
patchset applied (unlike the original code submits which had race
conditions and potentially kernel crashing bugs).

The next thing I plan to work on is a bitflag in the swap entry for
the WP tracking so that WP tracking works correctly through swapins
without false positives. It'll work like soft-dirty. Possible that
other things are still uncovered in the WP support.

THP should be covered now (the callback was missing in the original
submit but I fixed that). KVM it's not entirely clear why it didn't
work before but it may require changes to the KVM code if this is not
enough. KVM should not use gup(write=1) for read faults on shadow
pagetables, so it has at least a chance to work.

I'm also considering using a reserved bitflag in the mapped/present
pte/trans_huge_pmds to track which virtual addresses have been
wrprotected. Without a reserved bitflag, fork() would inevitably lead
to WP userfaults false positives. I'm not sure if it's required or if
it should be left up to userland to enforce the pagetables don't
become wrprotected (i.e. use MADV_DONTFORK like of course KVM already
does). First we've to solve the false positives through swap anyway,
the two should be orthogonal improvements.

If you could test the live snapshotting patchset on my kernel master
branch and report any issue or incremental fix against my branch, it'd
be great.

On my side I think I'll focus on testing by extending the testsuite
inside the kernel to exercise WP tracking too.

There are several other active users of the new userfaultfd features,
including JIT garbage collection (that previously used mprotect and
trapped SIGSEGV), distributed shared memory, SQL database robustness
in hugetlbfs holes and postcopy live migration of containers (a
process using userfaultfd of its own being live migrated inside a
containers with the non-cooperative model, isn't solved yet though).

Thanks,
Andrea

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Hailiang Zhang


On 2016/7/5 17:57, Baptiste Reynal wrote:

On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
 wrote:

On 2016/7/4 20:22, Baptiste Reynal wrote:


On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
 wrote:


For now, we still didn't support live memory snapshot, we have discussed
a scheme which based on userfaultfd long time ago.
You can find the discussion by the follow link:
https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html

The process of this live memory scheme is like bellow:
1. Pause VM
2. Enable write-protect fault notification by using userfaultfd to
 mark VM's memory to write-protect (readonly).
3. Save VM's static state (here is device state) to snapshot file
4. Resume VM, VM is going to run.
5. Snapshot thread begins to save VM's live state (here is RAM) into
 snapshot file.
6. During this time, all the actions of writing VM's memory will be
blocked
by kernel, and kernel will wakeup the fault treating thread in qemu to
process this write-protect fault. The fault treating thread will
deliver this
page's address to snapshot thread.
7. snapshot thread gets this address, save this page into snasphot file,
 and then remove the write-protect by using userfaultfd API, after
that,
 the actions of writing will be recovered.
8. Repeat step 5~7 until all VM's memory is saved to snapshot file

Compared with the feature of 'migrate VM's state to file',
the main difference for live memory snapshot is it has little time delay
for
catching VM's state. It just captures the VM's state while got users
snapshot
command, just like take a photo of VM's state.

For now, we only support tcg accelerator, since userfaultfd is not
supporting
tracking write faults for KVM.

Usage:
1. Take a snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine
pc-i440fx-2.5,accel=tcg,usb=off -drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio
Issue snapshot command:
(qemu)migrate -d file:/home/Snapshot
2. Revert to the snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine
pc-i440fx-2.5,accel=tcg,usb=off -drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio -incoming file:/home/Snapshot

NOTE:
The userfaultfd write protection feature does not support THP for now,
Before taking snapshot, please disable THP by:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

TODO:
- Reduce the influence for VM while taking snapshot

zhanghailiang (13):
postcopy/migration: Split fault related state into struct
  UserfaultState
migration: Allow the migrate command to work on file: urls
migration: Allow -incoming to work on file: urls
migration: Create a snapshot thread to realize saving memory snapshot
migration: implement initialization work for snapshot
QEMUSizedBuffer: Introduce two help functions for qsb
savevm: Split qemu_savevm_state_complete_precopy() into two helper
  functions
snapshot: Save VM's device state into snapshot file
migration/postcopy-ram: fix some helper functions to support
  userfaultfd write-protect
snapshot: Enable the write-protect notification capability for VM's
  RAM
snapshot/migration: Save VM's RAM into snapshot file
migration/ram: Fix some helper functions' parameter to use
  PageSearchStatus
snapshot: Remove page's write-protect and copy the content during
  setup stage

   include/migration/migration.h |  41 +--
   include/migration/postcopy-ram.h  |   9 +-
   include/migration/qemu-file.h |   3 +-
   include/qemu/typedefs.h   |   1 +
   include/sysemu/sysemu.h   |   3 +
   linux-headers/linux/userfaultfd.h |  21 +++-
   migration/fd.c|  51 -
   migration/migration.c | 101 -
   migration/postcopy-ram.c  | 229
--
   migration/qemu-file-buf.c |  61 ++
   migration/ram.c   | 104 -
   migration/savevm.c|  90 ---
   trace-events  |   1 +
   13 files changed, 587 insertions(+), 128 deletions(-)

--
1.8.3.1







Hi,


Hi Hailiang,

Can I get the status of this patch series ? I cannot find a v2.



Yes, I haven't updated it for long time, it is based on userfault-wp API
in kernel, and Andrea didn't update the related patches until

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Baptiste Reynal

On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
 wrote:
> On 2016/7/4 20:22, Baptiste Reynal wrote:
>>
>> On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
>>  wrote:
>>>
>>> For now, we still didn't support live memory snapshot, we have discussed
>>> a scheme which based on userfaultfd long time ago.
>>> You can find the discussion by the follow link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>
>>> The scheme is based on userfaultfd's write-protect capability.
>>> The userfaultfd write protection feature is available here:
>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>
>>> The process of this live memory scheme is like bellow:
>>> 1. Pause VM
>>> 2. Enable write-protect fault notification by using userfaultfd to
>>> mark VM's memory to write-protect (readonly).
>>> 3. Save VM's static state (here is device state) to snapshot file
>>> 4. Resume VM, VM is going to run.
>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>> snapshot file.
>>> 6. During this time, all the actions of writing VM's memory will be
>>> blocked
>>>by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>process this write-protect fault. The fault treating thread will
>>> deliver this
>>>page's address to snapshot thread.
>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>> and then remove the write-protect by using userfaultfd API, after
>>> that,
>>> the actions of writing will be recovered.
>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>
>>> Compared with the feature of 'migrate VM's state to file',
>>> the main difference for live memory snapshot is it has little time delay
>>> for
>>> catching VM's state. It just captures the VM's state while got users
>>> snapshot
>>> command, just like take a photo of VM's state.
>>>
>>> For now, we only support tcg accelerator, since userfaultfd is not
>>> supporting
>>> tracking write faults for KVM.
>>>
>>> Usage:
>>> 1. Take a snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio
>>> Issue snapshot command:
>>> (qemu)migrate -d file:/home/Snapshot
>>> 2. Revert to the snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio -incoming file:/home/Snapshot
>>>
>>> NOTE:
>>> The userfaultfd write protection feature does not support THP for now,
>>> Before taking snapshot, please disable THP by:
>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>
>>> TODO:
>>> - Reduce the influence for VM while taking snapshot
>>>
>>> zhanghailiang (13):
>>>postcopy/migration: Split fault related state into struct
>>>  UserfaultState
>>>migration: Allow the migrate command to work on file: urls
>>>migration: Allow -incoming to work on file: urls
>>>migration: Create a snapshot thread to realize saving memory snapshot
>>>migration: implement initialization work for snapshot
>>>QEMUSizedBuffer: Introduce two help functions for qsb
>>>savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>  functions
>>>snapshot: Save VM's device state into snapshot file
>>>migration/postcopy-ram: fix some helper functions to support
>>>  userfaultfd write-protect
>>>snapshot: Enable the write-protect notification capability for VM's
>>>  RAM
>>>snapshot/migration: Save VM's RAM into snapshot file
>>>migration/ram: Fix some helper functions' parameter to use
>>>  PageSearchStatus
>>>snapshot: Remove page's write-protect and copy the content during
>>>  setup stage
>>>
>>>   include/migration/migration.h |  41 +--
>>>   include/migration/postcopy-ram.h  |   9 +-
>>>   include/migration/qemu-file.h |   3 +-
>>>   include/qemu/typedefs.h   |   1 +
>>>   include/sysemu/sysemu.h   |   3 +
>>>   linux-headers/linux/userfaultfd.h |  21 +++-
>>>   migration/fd.c|  51 -
>>>   migration/migration.c | 101 -
>>>   migration/postcopy-ram.c  | 229
>>> --
>>>   migration/qemu-file-buf.c |  61 ++
>>>   migration/ram.c   | 104 -
>>>   migration/savevm.c|  90 ---
>>>   trace-events

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-04 Thread Hailiang Zhang

On 2016/7/4 20:22, Baptiste Reynal wrote:

On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
wrote:

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html

For now, we only support tcg accelerator, since userfaultfd is not supporting
tracking write faults for KVM.

NOTE:
The userfaultfd write protection feature does not support THP for now,
Before taking snapshot, please disable THP by:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

TODO:
- Reduce the influence for VM while taking snapshot

zhanghailiang (13):
postcopy/migration: Split fault related state into struct
UserfaultState
migration: Allow the migrate command to work on file: urls
migration: Allow -incoming to work on file: urls
migration: Create a snapshot thread to realize saving memory snapshot
migration: implement initialization work for snapshot
QEMUSizedBuffer: Introduce two help functions for qsb
savevm: Split qemu_savevm_state_complete_precopy() into two helper
functions
snapshot: Save VM's device state into snapshot file
migration/postcopy-ram: fix some helper functions to support
userfaultfd write-protect
snapshot: Enable the write-protect notification capability for VM's
RAM
snapshot/migration: Save VM's RAM into snapshot file
migration/ram: Fix some helper functions' parameter to use
PageSearchStatus
snapshot: Remove page's write-protect and copy the content during
setup stage

--
1.8.3.1

Hi,

Hi Hailiang,

Can I get the status of this patch series ? I cannot find a v2.

Yes, I haven't updated it for long time, it is based on userfault-wp API
in kernel, and Andrea didn't update the related patches until recent days.
I will update this series in the next one or two weeks. But it will only
support TCG until userfault-wp API supports KVM.

About TCG limitation, is

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-04 Thread Baptiste Reynal

On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
 wrote:
> For now, we still didn't support live memory snapshot, we have discussed
> a scheme which based on userfaultfd long time ago.
> You can find the discussion by the follow link:
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>
> The scheme is based on userfaultfd's write-protect capability.
> The userfaultfd write protection feature is available here:
> http://www.spinics.net/lists/linux-mm/msg97422.html
>
> The process of this live memory scheme is like bellow:
> 1. Pause VM
> 2. Enable write-protect fault notification by using userfaultfd to
>mark VM's memory to write-protect (readonly).
> 3. Save VM's static state (here is device state) to snapshot file
> 4. Resume VM, VM is going to run.
> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>snapshot file.
> 6. During this time, all the actions of writing VM's memory will be blocked
>   by kernel, and kernel will wakeup the fault treating thread in qemu to
>   process this write-protect fault. The fault treating thread will deliver 
> this
>   page's address to snapshot thread.
> 7. snapshot thread gets this address, save this page into snasphot file,
>and then remove the write-protect by using userfaultfd API, after that,
>the actions of writing will be recovered.
> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>
> Compared with the feature of 'migrate VM's state to file',
> the main difference for live memory snapshot is it has little time delay for
> catching VM's state. It just captures the VM's state while got users snapshot
> command, just like take a photo of VM's state.
>
> For now, we only support tcg accelerator, since userfaultfd is not supporting
> tracking write faults for KVM.
>
> Usage:
> 1. Take a snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off 
> -drive 
> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
> --monitor stdio
> Issue snapshot command:
> (qemu)migrate -d file:/home/Snapshot
> 2. Revert to the snapshot
> #x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=tcg,usb=off 
> -drive 
> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>  -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m 
> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0  
> --monitor stdio -incoming file:/home/Snapshot
>
> NOTE:
> The userfaultfd write protection feature does not support THP for now,
> Before taking snapshot, please disable THP by:
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>
> TODO:
> - Reduce the influence for VM while taking snapshot
>
> zhanghailiang (13):
>   postcopy/migration: Split fault related state into struct
> UserfaultState
>   migration: Allow the migrate command to work on file: urls
>   migration: Allow -incoming to work on file: urls
>   migration: Create a snapshot thread to realize saving memory snapshot
>   migration: implement initialization work for snapshot
>   QEMUSizedBuffer: Introduce two help functions for qsb
>   savevm: Split qemu_savevm_state_complete_precopy() into two helper
> functions
>   snapshot: Save VM's device state into snapshot file
>   migration/postcopy-ram: fix some helper functions to support
> userfaultfd write-protect
>   snapshot: Enable the write-protect notification capability for VM's
> RAM
>   snapshot/migration: Save VM's RAM into snapshot file
>   migration/ram: Fix some helper functions' parameter to use
> PageSearchStatus
>   snapshot: Remove page's write-protect and copy the content during
> setup stage
>
>  include/migration/migration.h |  41 +--
>  include/migration/postcopy-ram.h  |   9 +-
>  include/migration/qemu-file.h |   3 +-
>  include/qemu/typedefs.h   |   1 +
>  include/sysemu/sysemu.h   |   3 +
>  linux-headers/linux/userfaultfd.h |  21 +++-
>  migration/fd.c|  51 -
>  migration/migration.c | 101 -
>  migration/postcopy-ram.c  | 229 
> --
>  migration/qemu-file-buf.c |  61 ++
>  migration/ram.c   | 104 -
>  migration/savevm.c|  90 ---
>  trace-events  |   1 +
>  13 files changed, 587 insertions(+), 128 deletions(-)
>
> --
> 1.8.3.1
>
>
>

Hi Hailiang,

Can I get the status of this patch series ? I cannot find a v2.
About TCG limitation, is KVM support on a TODO list or is there a
strong technical barrier ?

Thanks,
Baptiste

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

19 matches

Site Navigation

Mail list logo

Footer information