Re: [PULL 00/18] migration queue

2022-04-21 Thread Richard Henderson

On 4/21/22 11:40, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The following changes since commit 28298069afff3eb696e4995e63b2579b27adf378:

   Merge tag 'misc-pull-request' of gitlab.com:marcandre.lureau/qemu into 
staging (2022-04-21 09:27:54 -0700)

are available in the Git repository at:

   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421a

for you to fetch changes up to 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e:

   migration: Read state once (2022-04-21 19:36:46 +0100)


V2: Migration pull 2022-04-21

   Dan: Test fixes and improvements (TLS mostly)
   Peter: Postcopy improvements
   Me: Race fix for info migrate, and compilation fix

V2:
   Fixed checkpatch nit of unneeded NULL check

Signed-off-by: Dr. David Alan Gilbert 


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~






Daniel P. Berrangé (9):
   tests: improve error message when saving TLS PSK file fails
   tests: support QTEST_TRACE env variable
   tests: print newline after QMP response in qtest logs
   migration: fix use of TLS PSK credentials with a UNIX socket
   tests: switch MigrateStart struct to be stack allocated
   tests: merge code for UNIX and TCP migration pre-copy tests
   tests: introduce ability to provide hooks for migration precopy test
   tests: switch migration FD passing test to use common precopy helper
   tests: expand the migration precopy helper to support failures

Dr. David Alan Gilbert (2):
   migration: Fix operator type
   migration: Read state once

Peter Xu (7):
   migration: Postpone releasing MigrationState.hostname
   migration: Drop multifd tls_hostname cache
   migration: Add pss.postcopy_requested status
   migration: Move migrate_allow_multifd and helpers into migration.c
   migration: Export ram_load_postcopy()
   migration: Move channel setup out of postcopy_try_recover()
   migration: Allow migrate-recover to run multiple times

  migration/channel.c |   1 -
  migration/migration.c   |  66 ---
  migration/migration.h   |   4 +-
  migration/multifd.c |  29 +--
  migration/multifd.h |   4 -
  migration/ram.c |  10 +-
  migration/ram.h |   1 +
  migration/savevm.c  |   3 -
  migration/tls.c |   4 -
  tests/qtest/libqtest.c  |  13 +-
  tests/qtest/migration-test.c| 368 
  tests/unit/crypto-tls-psk-helpers.c |   2 +-
  12 files changed, 267 insertions(+), 238 deletions(-)







Re: [PULL 00/18] migration queue

2022-04-21 Thread Dr. David Alan Gilbert
* Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The following changes since commit 401d46789410e88e9e90d76a11f46e8e9f358d55:
> 
>   Merge tag 'pull-target-arm-20220421' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-04-21 
> 08:04:43 -0700)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421b
> 
> for you to fetch changes up to 25e7d2fd25d133a9f714443974b51e50416546a5:
> 
>   migration: Read state once (2022-04-21 17:33:50 +0100)

Oops, this has a checkpatch nit; just reposted a fixed version.

Dave

> 
> Migration pull 2022-04-21
> 
>   Dan: Test fixes and improvements (TLS mostly)
>   Peter: Postcopy improvements
>   Me: Race fix for info migrate, and compilation fix
> 
> Signed-off-by: Dr. David Alan Gilbert 
> 
> 
> Daniel P. Berrangé (9):
>   tests: improve error message when saving TLS PSK file fails
>   tests: support QTEST_TRACE env variable
>   tests: print newline after QMP response in qtest logs
>   migration: fix use of TLS PSK credentials with a UNIX socket
>   tests: switch MigrateStart struct to be stack allocated
>   tests: merge code for UNIX and TCP migration pre-copy tests
>   tests: introduce ability to provide hooks for migration precopy test
>   tests: switch migration FD passing test to use common precopy helper
>   tests: expand the migration precopy helper to support failures
> 
> Dr. David Alan Gilbert (2):
>   migration: Fix operator type
>   migration: Read state once
> 
> Peter Xu (7):
>   migration: Postpone releasing MigrationState.hostname
>   migration: Drop multifd tls_hostname cache
>   migration: Add pss.postcopy_requested status
>   migration: Move migrate_allow_multifd and helpers into migration.c
>   migration: Export ram_load_postcopy()
>   migration: Move channel setup out of postcopy_try_recover()
>   migration: Allow migrate-recover to run multiple times
> 
>  migration/channel.c |   1 -
>  migration/migration.c   |  68 ---
>  migration/migration.h   |   4 +-
>  migration/multifd.c |  29 +--
>  migration/multifd.h |   4 -
>  migration/ram.c |  10 +-
>  migration/ram.h |   1 +
>  migration/savevm.c  |   3 -
>  migration/tls.c |   4 -
>  tests/qtest/libqtest.c  |  13 +-
>  tests/qtest/migration-test.c| 368 
> 
>  tests/unit/crypto-tls-psk-helpers.c |   2 +-
>  12 files changed, 269 insertions(+), 238 deletions(-)
> 
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PULL 00/18] migration queue

2022-04-21 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The following changes since commit 28298069afff3eb696e4995e63b2579b27adf378:

  Merge tag 'misc-pull-request' of gitlab.com:marcandre.lureau/qemu into 
staging (2022-04-21 09:27:54 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421a

for you to fetch changes up to 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e:

  migration: Read state once (2022-04-21 19:36:46 +0100)


V2: Migration pull 2022-04-21

  Dan: Test fixes and improvements (TLS mostly)
  Peter: Postcopy improvements
  Me: Race fix for info migrate, and compilation fix

V2:
  Fixed checkpatch nit of unneeded NULL check

Signed-off-by: Dr. David Alan Gilbert 


Daniel P. Berrangé (9):
  tests: improve error message when saving TLS PSK file fails
  tests: support QTEST_TRACE env variable
  tests: print newline after QMP response in qtest logs
  migration: fix use of TLS PSK credentials with a UNIX socket
  tests: switch MigrateStart struct to be stack allocated
  tests: merge code for UNIX and TCP migration pre-copy tests
  tests: introduce ability to provide hooks for migration precopy test
  tests: switch migration FD passing test to use common precopy helper
  tests: expand the migration precopy helper to support failures

Dr. David Alan Gilbert (2):
  migration: Fix operator type
  migration: Read state once

Peter Xu (7):
  migration: Postpone releasing MigrationState.hostname
  migration: Drop multifd tls_hostname cache
  migration: Add pss.postcopy_requested status
  migration: Move migrate_allow_multifd and helpers into migration.c
  migration: Export ram_load_postcopy()
  migration: Move channel setup out of postcopy_try_recover()
  migration: Allow migrate-recover to run multiple times

 migration/channel.c |   1 -
 migration/migration.c   |  66 ---
 migration/migration.h   |   4 +-
 migration/multifd.c |  29 +--
 migration/multifd.h |   4 -
 migration/ram.c |  10 +-
 migration/ram.h |   1 +
 migration/savevm.c  |   3 -
 migration/tls.c |   4 -
 tests/qtest/libqtest.c  |  13 +-
 tests/qtest/migration-test.c| 368 
 tests/unit/crypto-tls-psk-helpers.c |   2 +-
 12 files changed, 267 insertions(+), 238 deletions(-)




[PULL 00/18] migration queue

2022-04-21 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The following changes since commit 401d46789410e88e9e90d76a11f46e8e9f358d55:

  Merge tag 'pull-target-arm-20220421' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-04-21 
08:04:43 -0700)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220421b

for you to fetch changes up to 25e7d2fd25d133a9f714443974b51e50416546a5:

  migration: Read state once (2022-04-21 17:33:50 +0100)


Migration pull 2022-04-21

  Dan: Test fixes and improvements (TLS mostly)
  Peter: Postcopy improvements
  Me: Race fix for info migrate, and compilation fix

Signed-off-by: Dr. David Alan Gilbert 


Daniel P. Berrangé (9):
  tests: improve error message when saving TLS PSK file fails
  tests: support QTEST_TRACE env variable
  tests: print newline after QMP response in qtest logs
  migration: fix use of TLS PSK credentials with a UNIX socket
  tests: switch MigrateStart struct to be stack allocated
  tests: merge code for UNIX and TCP migration pre-copy tests
  tests: introduce ability to provide hooks for migration precopy test
  tests: switch migration FD passing test to use common precopy helper
  tests: expand the migration precopy helper to support failures

Dr. David Alan Gilbert (2):
  migration: Fix operator type
  migration: Read state once

Peter Xu (7):
  migration: Postpone releasing MigrationState.hostname
  migration: Drop multifd tls_hostname cache
  migration: Add pss.postcopy_requested status
  migration: Move migrate_allow_multifd and helpers into migration.c
  migration: Export ram_load_postcopy()
  migration: Move channel setup out of postcopy_try_recover()
  migration: Allow migrate-recover to run multiple times

 migration/channel.c |   1 -
 migration/migration.c   |  68 ---
 migration/migration.h   |   4 +-
 migration/multifd.c |  29 +--
 migration/multifd.h |   4 -
 migration/ram.c |  10 +-
 migration/ram.h |   1 +
 migration/savevm.c  |   3 -
 migration/tls.c |   4 -
 tests/qtest/libqtest.c  |  13 +-
 tests/qtest/migration-test.c| 368 
 tests/unit/crypto-tls-psk-helpers.c |   2 +-
 12 files changed, 269 insertions(+), 238 deletions(-)




Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Peter Maydell
On Tue, 15 Mar 2022 at 16:15, Dr. David Alan Gilbert
 wrote:
> The initial description of compressBound being wrong doesn't
> feel like it would cause that; it claims it would trigger an error
> (I'm not sure how good we are at spotting that!); but then later
> in the description it says:
>
> 'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
>   (incl. the bit definitions), may cause various and unforseen defects'
>
> Certainly looks like a 'various and unforseen defect'.

Mmm. I couldn't get the testcase in that bug to fail on the machine
I see the migration-test fails on, so it presumably is a different bug
(or just faintly possibly a QEMU bug that's only tickled by the
specifics of the accelerated zlib behaviour).

-- PMM



Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Tue, 15 Mar 2022 at 14:39, Peter Maydell  wrote:
> >
> > On Mon, 14 Mar 2022 at 19:44, Peter Maydell  
> > wrote:
> > > On Mon, 14 Mar 2022 at 18:58, Peter Maydell  
> > > wrote:
> > > > I just hit the abort case, narrowing it down to the
> > > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > > any other tests being run:
> > >
> > > > This test seems to fail fairly frequently. I'll try a bisect...
> > >
> > > On this s390 machine, this test has been intermittent since
> > > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > > multifd support") in 2019.
> >
> > I have tried (on current master) runs of various of the other
> > migration tests, and:
> >  * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
> >failing
> >  * /i386/migration/precopy/tcp completed 4669 iterations without
> >failing
> >  * /i386/migration/multifd/tcp/zlib fails usually within the first
> >10 iterations (the most I ever saw it manage was 32)
> >
> > So whatever this is, it seems like it might be specific to the
> > zlib code somehow ?
> 
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?

The initial description of compressBound being wrong doesn't
feel like it would cause that; it claims it would trigger an error
(I'm not sure how good we are at spotting that!); but then later
in the description it says:

'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
  (incl. the bit definitions), may cause various and unforseen defects'

Certainly looks like a 'various and unforseen defect'.

Dave

> That bug report claims it doesn't affect focal, though, which
> is what we're running on this box (specifically, the zlib1g
> package is version 1:1.2.11.dfsg-2ubuntu1.2).
> 
> A run with DFLTCC=0 has made it past 60 iterations so far, which
> suggests that that does serve as a workaround for the bug.
> 
> thanks
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Peter Maydell
On Tue, 15 Mar 2022 at 15:41, Daniel P. Berrangé  wrote:
> So if this is a host OS package bug we punt to OS vendor to fix,
> and just apply workaround in our CI ?  eg
>
> $ git diff
> diff --git a/.travis.yml b/.travis.yml
> index c3c8048842..6da4c9f640 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -218,6 +218,7 @@ jobs:
>  - TEST_CMD="make check check-tcg V=1"
>  - CONFIG="--disable-containers 
> --target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
>  - UNRELIABLE=true
> +- DFLTCC=0
>script:
>  - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
>  - |
>

Yes, that seems like the best approach. We also need to
adjust the gitlab CI config for the s390-host jobs. (In that
case we control the system being used so if there's a fixed
zlib we could install it, but for the travis stuff we'll probably
need the workaround for some time.)

-- PMM



Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Daniel P . Berrangé
On Tue, Mar 15, 2022 at 03:30:27PM +, Peter Maydell wrote:
> On Tue, 15 Mar 2022 at 15:03, Peter Maydell  wrote:
> > Maybe we're running into this bug
> > https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> > ("zlib: compressBound() returns an incorrect result on z15") ?
> 
> Full repro info, since it's a bit hidden in this long thread:
> 
> Build an i386 guest QEMU; I used this configure command:
> 
> '../../configure' '--target-list=i386-softmmu' '--enable-debug'
> '--with-pkgversion=pm215' '--disable-docs'
> 
> Then run the multifd/tcp/zlib test in a tight loop:
> 
> X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
> ./build/i386/tests/qtest/migration-test  -tap -k -p
> /i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done
> 
> Without DFLTCC=0 it fails typically within 5 or so iterations;
> the longest I've ever seen it go is about 32.

So if this is a host OS package bug we punt to OS vendor to fix,
and just apply workaround in our CI ?  eg

$ git diff
diff --git a/.travis.yml b/.travis.yml
index c3c8048842..6da4c9f640 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -218,6 +218,7 @@ jobs:
 - TEST_CMD="make check check-tcg V=1"
 - CONFIG="--disable-containers 
--target-list=${MAIN_SOFTMMU_TARGETS},s390x-linux-user"
 - UNRELIABLE=true
+- DFLTCC=0
   script:
 - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
 - |



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Peter Maydell
On Tue, 15 Mar 2022 at 15:03, Peter Maydell  wrote:
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?

Full repro info, since it's a bit hidden in this long thread:

Build an i386 guest QEMU; I used this configure command:

'../../configure' '--target-list=i386-softmmu' '--enable-debug'
'--with-pkgversion=pm215' '--disable-docs'

Then run the multifd/tcp/zlib test in a tight loop:

X=1; while QTEST_QEMU_BINARY=./build/i386/i386-softmmu/qemu-system-i386
./build/i386/tests/qtest/migration-test  -tap -k -p
/i386/migration/multifd/tcp/zlib ; do echo $X; X=$((X+1)); done

Without DFLTCC=0 it fails typically within 5 or so iterations;
the longest I've ever seen it go is about 32.

-- PMM



Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Peter Maydell
On Tue, 15 Mar 2022 at 14:39, Peter Maydell  wrote:
>
> On Mon, 14 Mar 2022 at 19:44, Peter Maydell  wrote:
> > On Mon, 14 Mar 2022 at 18:58, Peter Maydell  
> > wrote:
> > > I just hit the abort case, narrowing it down to the
> > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > any other tests being run:
> >
> > > This test seems to fail fairly frequently. I'll try a bisect...
> >
> > On this s390 machine, this test has been intermittent since
> > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > multifd support") in 2019.
>
> I have tried (on current master) runs of various of the other
> migration tests, and:
>  * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
>failing
>  * /i386/migration/precopy/tcp completed 4669 iterations without
>failing
>  * /i386/migration/multifd/tcp/zlib fails usually within the first
>10 iterations (the most I ever saw it manage was 32)
>
> So whatever this is, it seems like it might be specific to the
> zlib code somehow ?

Maybe we're running into this bug
https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
("zlib: compressBound() returns an incorrect result on z15") ?

That bug report claims it doesn't affect focal, though, which
is what we're running on this box (specifically, the zlib1g
package is version 1:1.2.11.dfsg-2ubuntu1.2).

A run with DFLTCC=0 has made it past 60 iterations so far, which
suggests that that does serve as a workaround for the bug.

thanks
-- PMM



Re: [PULL 00/18] migration queue

2022-03-15 Thread Christian Borntraeger

Am 08.03.22 um 19:47 schrieb Dr. David Alan Gilbert:

* Philippe Mathieu-Daudé (philippe.mathieu.da...@gmail.com) wrote:

On 3/3/22 15:46, Peter Maydell wrote:

On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
 wrote:


From: "Dr. David Alan Gilbert" 

The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:

Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into 
staging (2022-03-02 12:38:46 +)

are available in the Git repository at:

https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b

for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:

migration: Remove load_state_old and minimum_version_id_old (2022-03-02 
18:20:45 +)


Migration/HMP/Virtio pull 2022-03-02

A bit of a mix this time:
* Minor fixes from myself, Hanna, and Jack
* VNC password rework by Stefan and Fabian
* Postcopy changes from Peter X that are
  the start of a larger series to come
* Removing the prehistoic load_state_old
  code from Peter M


I'm seeing an error on the s390x runner:

▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
assertion failed: (bad == 0) ERROR

  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-testERROR
78.87s   killed by signal 6 SIGABRT

https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848


Yeh, thuth mentioned that, it seems to only be s390 which is odd.
I'm not seeing anything obviously architecture dependent in that set, or
for that matter that plays with the ram migration stream much.
Is this reliable enough that someone with a tame s390 could bisect?


I just asked Peter to try with DFLTCC=0 to disable the hardware acceleration. 
Maybe
the zlib library still has a bug? (We are not aware of any problem right now).
In case DFLTCC makes a difference, this would be something for Ilya to look at.




multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue)

2022-03-15 Thread Peter Maydell
On Mon, 14 Mar 2022 at 19:44, Peter Maydell  wrote:
> On Mon, 14 Mar 2022 at 18:58, Peter Maydell  wrote:
> > I just hit the abort case, narrowing it down to the
> > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > any other tests being run:
>
> > This test seems to fail fairly frequently. I'll try a bisect...
>
> On this s390 machine, this test has been intermittent since
> it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> multifd support") in 2019.

I have tried (on current master) runs of various of the other
migration tests, and:
 * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
   failing
 * /i386/migration/precopy/tcp completed 4669 iterations without
   failing
 * /i386/migration/multifd/tcp/zlib fails usually within the first
   10 iterations (the most I ever saw it manage was 32)

So whatever this is, it seems like it might be specific to the
zlib code somehow ?

thanks
-- PMM



Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Xu
On Mon, Mar 14, 2022 at 06:53:29PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 14, 2022 at 06:20:54PM +, Dr. David Alan Gilbert wrote:
> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
> > >  wrote:
> > > >
> > > > Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > > > One thing that makes this bug investigation trickier, incidentally,
> > > > > is that the migration-test code seems to depend on userfaultfd.
> > > > > That means you can't run it under 'rr'.
> > > >
> > > > That should only be the postcopy tests; the others shouldn't use that.
> > > 
> > > tests/qtest/migration-test.c:main() exits immediately without adding
> > > any of the test cases if ufd_version_check() fails, so no userfaultfd
> > > means no tests run at all, currently.
> > 
> > Ouch! I could swear we had a fix for that.

https://lore.kernel.org/qemu-devel/20210615175523.439830-2-pet...@redhat.com/

I remembered for some reason that pull (containing this patch) got issues
on applying, and that patch got forgotten.

> > 
> > Anyway, it would be really good to see what migrate-query was returning;
> > if it's stuck in running or cancelling then it's a problem with multifd
> > that needs to learn to let go if someone is trying to cancel.
> > If it's failed or similar then the test needs fixing to not lockup.
> 
> This patch of mine may well be helpful:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03192.html
> 
> when debugging my TLS tests various mistakes meant I ended up with
> a failed session, but the test was spinning forever on 'query-migrate'.
> It was waiting for it to finish one iteration, and never bothering to
> validate that the reported status == active.
> 
> If that patch was merged, it might well cause the test to abort in an
> assertion rather than spining forever, if status == failed.
> 
> Of course someone would still need to find out why it failed, but
> none the less, I think assert is nicer than spin forever.

Agreed.

-- 
Peter Xu




Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Maydell
On Mon, 14 Mar 2022 at 18:58, Peter Maydell  wrote:
>
> On Mon, 14 Mar 2022 at 17:15, Peter Maydell  wrote:
> >
> > On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé  
> > wrote:
> > > So the test harness is waiting for a reply to 'query-migrate'.
> > >
> > > This should be fast unless QEMU has hung in the main event
> > > loop servicing monitor commands, or stopped.
> >
> > I was kind of loose with the terminology -- I don't remember whether
> > it was actually hung in the sense of stopped entirely, or just
> > "sat in a loop waiting for a migration state that never arrives".
> > I'll try to look more closely if I can catch it in the act again.
>
> I just hit the abort case, narrowing it down to the
> /i386/migration/multifd/tcp/zlib case, which can hit this without
> any other tests being run:

> This test seems to fail fairly frequently. I'll try a bisect...

On this s390 machine, this test has been intermittent since
it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
multifd support") in 2019. On that commit (after 31 successful
runs):

# random seed: R02S17937f515046216afcc72143266b3e1f
# Start of i386 tests
# Start of migration tests
# Start of multifd tests
# Start of tcp tests
# starting QEMU: exec ./build/i386/i386-softmmu/qemu-system-i386
-qtest unix:/tmp/qtest-861747.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-861747.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-7qODSs/src_serial -drive
file=/tmp/migration-test-7qODSs/bootsect,format=raw-accel qtest
qemu-system-i386: -accel kvm: invalid accelerator kvm
qemu-system-i386: falling back to tcg
# starting QEMU: exec ./build/i386/i386-softmmu/qemu-system-i386
-qtest unix:/tmp/qtest-861747.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-861747.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
target,debug-threads=on -m 150M -serial
file:/tmp/migration-test-7qODSs/dest_serial -incoming defer -drive
file=/tmp/migration-test-7qODSs/bootsect,format=raw-accel qtest
qemu-system-i386: -accel kvm: invalid accelerator kvm
qemu-system-i386: falling back to tcg
Memory content inconsistency at 5cff000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d0 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d01000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d02000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d03000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d04000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d05000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d06000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d07000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
Memory content inconsistency at 5d08000 first_byte = 2 last_byte = 1
current = 0 hit_edge = 1
and in another 118 pages**
ERROR:/home/linux1/qemu/tests/qtest/migration-test.c:268:check_guests_ram:
assertion failed: (bad == 0)
Bail out! 
ERROR:/home/linux1/qemu/tests/qtest/migration-test.c:268:check_guests_ram:
assertion failed: (bad == 0)
Aborted (core dumped)

-- PMM



Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Maydell
On Mon, 14 Mar 2022 at 17:15, Peter Maydell  wrote:
>
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé  wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
>
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

I just hit the abort case, narrowing it down to the
/i386/migration/multifd/tcp/zlib case, which can hit this without
any other tests being run:

$ QTEST_QEMU_BINARY=./qemu-system-i386 ./tests/qtest/migration-test
-tap -k -p /i386/migration/multifd/tcp/zlib
# random seed: R02S37eab07b59417f6cd7e26d94df0d3908
# Start of i386 tests
# Start of migration tests
# Start of multifd tests
# Start of tcp tests
# starting QEMU: exec ./qemu-system-i386 -qtest
unix:/tmp/qtest-782502.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-782502.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-H8Ggsm/src_serial -drive
file=/tmp/migration-test-H8Ggsm/bootsect,format=raw-accel qtest
# starting QEMU: exec ./qemu-system-i386 -qtest
unix:/tmp/qtest-782502.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-782502.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
target,debug-threads=on -m 150M -serial
file:/tmp/migration-test-H8Ggsm/dest_serial -incoming defer -drive
file=/tmp/migration-test-H8Ggsm/bootsect,format=raw-accel qtest
Memory content inconsistency at 5f76000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f77000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f78000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f79000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7a000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7b000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7c000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7d000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7e000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
Memory content inconsistency at 5f7f000 first_byte = 2 last_byte = 1
current = 2 hit_edge = 1
and in another 17 pages**
ERROR:../../tests/qtest/migration-test.c:276:check_guests_ram:
assertion failed: (bad == 0)
Bail out! ERROR:../../tests/qtest/migration-test.c:276:check_guests_ram:
assertion failed: (bad == 0)
Aborted (core dumped)

This test seems to fail fairly frequently. I'll try a bisect...

thanks
-- PMM



Re: [PULL 00/18] migration queue

2022-03-14 Thread Daniel P . Berrangé
On Mon, Mar 14, 2022 at 06:20:54PM +, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
> >  wrote:
> > >
> > > Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > > One thing that makes this bug investigation trickier, incidentally,
> > > > is that the migration-test code seems to depend on userfaultfd.
> > > > That means you can't run it under 'rr'.
> > >
> > > That should only be the postcopy tests; the others shouldn't use that.
> > 
> > tests/qtest/migration-test.c:main() exits immediately without adding
> > any of the test cases if ufd_version_check() fails, so no userfaultfd
> > means no tests run at all, currently.
> 
> Ouch! I could swear we had a fix for that.
> 
> Anyway, it would be really good to see what migrate-query was returning;
> if it's stuck in running or cancelling then it's a problem with multifd
> that needs to learn to let go if someone is trying to cancel.
> If it's failed or similar then the test needs fixing to not lockup.

This patch of mine may well be helpful:

  https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03192.html

when debugging my TLS tests various mistakes meant I ended up with
a failed session, but the test was spinning forever on 'query-migrate'.
It was waiting for it to finish one iteration, and never bothering to
validate that the reported status == active.

If that patch was merged, it might well cause the test to abort in an
assertion rather than spining forever, if status == failed.

Of course someone would still need to find out why it failed, but
none the less, I think assert is nicer than spin forever.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 00/18] migration queue

2022-03-14 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
>  wrote:
> >
> > Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > One thing that makes this bug investigation trickier, incidentally,
> > > is that the migration-test code seems to depend on userfaultfd.
> > > That means you can't run it under 'rr'.
> >
> > That should only be the postcopy tests; the others shouldn't use that.
> 
> tests/qtest/migration-test.c:main() exits immediately without adding
> any of the test cases if ufd_version_check() fails, so no userfaultfd
> means no tests run at all, currently.

Ouch! I could swear we had a fix for that.

Anyway, it would be really good to see what migrate-query was returning;
if it's stuck in running or cancelling then it's a problem with multifd
that needs to learn to let go if someone is trying to cancel.
If it's failed or similar then the test needs fixing to not lockup.

Dave

> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Maydell
On Mon, 14 Mar 2022 at 17:55, Dr. David Alan Gilbert
 wrote:
>
> Peter Maydell (peter.mayd...@linaro.org) wrote:
> > One thing that makes this bug investigation trickier, incidentally,
> > is that the migration-test code seems to depend on userfaultfd.
> > That means you can't run it under 'rr'.
>
> That should only be the postcopy tests; the others shouldn't use that.

tests/qtest/migration-test.c:main() exits immediately without adding
any of the test cases if ufd_version_check() fails, so no userfaultfd
means no tests run at all, currently.

-- PMM



Re: [PULL 00/18] migration queue

2022-03-14 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé  wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
> 
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

Yeh, there's a big difference; still, if it's always in this test at
that point, then I think it's one for Juan; it looks like multifd cancel
path.

> One thing that makes this bug investigation trickier, incidentally,
> is that the migration-test code seems to depend on userfaultfd.
> That means you can't run it under 'rr'.

That should only be the postcopy tests; the others shouldn't use that.

Dave

> 
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PULL 00/18] migration queue

2022-03-14 Thread Daniel P . Berrangé
On Mon, Mar 14, 2022 at 05:15:57PM +, Peter Maydell wrote:
> On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé  wrote:
> > So the test harness is waiting for a reply to 'query-migrate'.
> >
> > This should be fast unless QEMU has hung in the main event
> > loop servicing monitor commands, or stopped.
> 
> I was kind of loose with the terminology -- I don't remember whether
> it was actually hung in the sense of stopped entirely, or just
> "sat in a loop waiting for a migration state that never arrives".
> I'll try to look more closely if I can catch it in the act again.

Ah yes, if it is just forever migrating, that would match the
stack traces shown from the QEMUs. 

> One thing that makes this bug investigation trickier, incidentally,
> is that the migration-test code seems to depend on userfaultfd.
> That means you can't run it under 'rr'.

Yeah, we also can't turn on the tracing for a live QEMU since the
monitor connection is already in use. Kinda need to have a second
monitor instance present, that we can connect to for debugging
the migration state.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Maydell
On Mon, 14 Mar 2022 at 17:07, Daniel P. Berrangé  wrote:
> So the test harness is waiting for a reply to 'query-migrate'.
>
> This should be fast unless QEMU has hung in the main event
> loop servicing monitor commands, or stopped.

I was kind of loose with the terminology -- I don't remember whether
it was actually hung in the sense of stopped entirely, or just
"sat in a loop waiting for a migration state that never arrives".
I'll try to look more closely if I can catch it in the act again.

One thing that makes this bug investigation trickier, incidentally,
is that the migration-test code seems to depend on userfaultfd.
That means you can't run it under 'rr'.

-- PMM



Re: [PULL 00/18] migration queue

2022-03-14 Thread Daniel P . Berrangé
On Mon, Mar 14, 2022 at 04:56:18PM +, Peter Maydell wrote:
> On Tue, 8 Mar 2022 at 18:47, Dr. David Alan Gilbert  
> wrote:
> >
> > * Philippe Mathieu-Daudé (philippe.mathieu.da...@gmail.com) wrote:
> > > I'm seeing an error on the s390x runner:
> > >
> > > ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> > > assertion failed: (bad == 0) ERROR
> > >
> > >  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-testERROR
> > > 78.87s   killed by signal 6 SIGABRT
> > >
> > > https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848
> >
> > Yeh, thuth mentioned that, it seems to only be s390 which is odd.
> > I'm not seeing anything obviously architecture dependent in that set, or
> > for that matter that plays with the ram migration stream much.
> > Is this reliable enough that someone with a tame s390 could bisect?
> 
> Didn't see a SIGABRT, but here's a gdb backtrace of a hang
> in the migration test on s390 host. I have also observed the
> migration test hanging on macos host, so I don't think this is
> s390-specific.
> 
> Process tree:
> migration-test(455775)-+-qemu-system-i38(456194)
>|-qemu-system-i38(456200)
>`-qemu-system-i38(456266)
> ===
> PROCESS: 455775
> linux1455775  312266  5 14:36 pts/000:07:19
> ./tests/qtest/migration-test -tap -k
> [New LWP 455776]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> __libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
> ../sysdeps/unix/sysv/linux/read.c:26
> 26  ../sysdeps/unix/sysv/linux/read.c: No such file or directory.


> 
> #5  0x02aa1e894ede in qtest_vqmp (s=0x2aa200f6a20,
> fmt=0x2aa1e8f140c "{ 'execute': 'query-migrate' }", ap=0x3ffe69fdb80)
> at ../../tests/qtest/libqtest.c:749

So the test harness is waiting for a reply to 'query-migrate'.

This should be fast unless QEMU has hung in the main event
loop servicing monitor commands, or stopped.


> ===
> PROCESS: 456194
> linux1456194  455775 85 14:39 pts/001:54:06 ./qemu-system-i386
> -qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
> socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> source,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-dmqzpM/src_serial -drive
> file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
> [New LWP 456196]
> [New LWP 456197]
> [New LWP 456198]
> [New LWP 456229]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
> 0x03ff9a071c9c in __ppoll (fds=0x2aa2c46c2f0, nfds=5,
> timeout=, sigmask=0x0) at
> ../sysdeps/unix/sysv/linux/ppoll.c:44
> 44  ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
> 
> Thread 5 (Thread 0x3fee0ff9900 (LWP 456229)):
> #0  futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0,
> expected=0, futex_word=0x2aa2c46e7e4) at
> ../sysdeps/nptl/futex-internal.h:320
> #1  do_futex_wait (sem=sem@entry=0x2aa2c46e7e0, abstime=0x0,
> clockid=0) at sem_waitcommon.c:112
> #2  0x03ff9a191870 in __new_sem_wait_slow (sem=0x2aa2c46e7e0,
> abstime=0x0, clockid=0) at sem_waitcommon.c:184
> #3  0x03ff9a19190e in __new_sem_wait (sem=) at 
> sem_wait.c:42
> #4  0x02aa2923da1e in qemu_sem_wait (sem=0x2aa2c46e7e0) at
> ../../util/qemu-thread-posix.c:358
> #5  0x02aa289483cc in multifd_send_sync_main (f=0x2aa2b5f92d0) at
> ../../migration/multifd.c:610
> #6  0x02aa28dfa30c in ram_save_iterate (f=0x2aa2b5f92d0,
> opaque=0x2aa29bf75d0 ) at ../../migration/ram.c:3049
> #7  0x02aa28958fee in qemu_savevm_state_iterate (f=0x2aa2b5f92d0,
> postcopy=false) at ../../migration/savevm.c:1296
> #8  0x02aa28942d40 in migration_iteration_run (s=0x2aa2b3f9800) at
> ../../migration/migration.c:3607
> #9  0x02aa289434da in migration_thread (opaque=0x2aa2b3f9800) at
> ../../migration/migration.c:3838
> #10 0x02aa2923e020 in qemu_thread_start (args=0x2aa2b8b29e0) at
> ../../util/qemu-thread-posix.c:556
> #11 0x03ff9a187e66 in start_thread (arg=0x3fee0ff9900) at
> pthread_create.c:477
> #12 0x03ff9a07cbf6 in thread_start () at
> ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
> 
> Thread 4 (Thread 0x3ff89f2f900 (LWP 456198)):
> #0  env_neg (env=0x2aa2b5f5030) at 
> /home/linux1/qemu/include/exec/cpu-all.h:478
> #1  0x02aa28f5376a in env_tlb (env=0x2aa2b5f5030) at
> /home/linux1/qemu/include/exec/cpu-all.h:502
> #2  0x02aa28f538a8 in tlb_index (env=0x2aa2b5f5030, mmu_idx=2,
> addr=73265152) at /home/linux1/qemu/include/exec/cpu_ldst.h:366
> #3  0x02aa28f574bc in tlb_set_page_with_attrs (cpu=0x2aa2b5ec750,
> vaddr=73265152, paddr=73265152, attrs=..., prot=7, mmu_idx=2,
> size=4096) at ../../accel/tcg/cputlb.c:1194
> #4  

Re: [PULL 00/18] migration queue

2022-03-14 Thread Peter Maydell
On Tue, 8 Mar 2022 at 18:47, Dr. David Alan Gilbert  wrote:
>
> * Philippe Mathieu-Daudé (philippe.mathieu.da...@gmail.com) wrote:
> > I'm seeing an error on the s390x runner:
> >
> > ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> > assertion failed: (bad == 0) ERROR
> >
> >  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-testERROR
> > 78.87s   killed by signal 6 SIGABRT
> >
> > https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848
>
> Yeh, thuth mentioned that, it seems to only be s390 which is odd.
> I'm not seeing anything obviously architecture dependent in that set, or
> for that matter that plays with the ram migration stream much.
> Is this reliable enough that someone with a tame s390 could bisect?

Didn't see a SIGABRT, but here's a gdb backtrace of a hang
in the migration test on s390 host. I have also observed the
migration test hanging on macos host, so I don't think this is
s390-specific.

Process tree:
migration-test(455775)-+-qemu-system-i38(456194)
   |-qemu-system-i38(456200)
   `-qemu-system-i38(456266)
===
PROCESS: 455775
linux1455775  312266  5 14:36 pts/000:07:19
./tests/qtest/migration-test -tap -k
[New LWP 455776]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
__libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
../sysdeps/unix/sysv/linux/read.c:26
26  ../sysdeps/unix/sysv/linux/read.c: No such file or directory.

Thread 2 (Thread 0x3ff9c7ff900 (LWP 455776)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x02aa1e8d24ca in qemu_futex_wait (f=0x2aa1e920cbc
, val=4294967295) at
/home/linux1/qemu/include/qemu/futex.h:29
#2  0x02aa1e8d276e in qemu_event_wait (ev=0x2aa1e920cbc
) at ../../util/qemu-thread-posix.c:481
#3  0x02aa1e8e6ce2 in call_rcu_thread (opaque=0x0) at ../../util/rcu.c:261
#4  0x02aa1e8d2998 in qemu_thread_start (args=0x2aa200e51e0) at
../../util/qemu-thread-posix.c:556
#5  0x03ff9ca87e66 in start_thread (arg=0x3ff9c7ff900) at
pthread_create.c:477
#6  0x03ff9c97cbf6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ff9ce75430 (LWP 455775)):
#0  __libc_read (nbytes=1, buf=0x3ffe69fd816, fd=4) at
../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=, buf=0x3ffe69fd816, nbytes=1) at
../sysdeps/unix/sysv/linux/read.c:24
#2  0x02aa1e894652 in qmp_fd_receive (fd=4) at
../../tests/qtest/libqtest.c:613
#3  0x02aa1e894816 in qtest_qmp_receive_dict (s=0x2aa200f6a20) at
../../tests/qtest/libqtest.c:648
#4  0x02aa1e894782 in qtest_qmp_receive (s=0x2aa200f6a20) at
../../tests/qtest/libqtest.c:636
#5  0x02aa1e894ede in qtest_vqmp (s=0x2aa200f6a20,
fmt=0x2aa1e8f140c "{ 'execute': 'query-migrate' }", ap=0x3ffe69fdb80)
at ../../tests/qtest/libqtest.c:749
#6  0x02aa1e891ac0 in wait_command (who=0x2aa200f6a20,
command=0x2aa1e8f140c "{ 'execute': 'query-migrate' }") at
../../tests/qtest/migration-helpers.c:63
#7  0x02aa1e891de8 in migrate_query (who=0x2aa200f6a20) at
../../tests/qtest/migration-helpers.c:107
#8  0x02aa1e891e1a in migrate_query_status (who=0x2aa200f6a20) at
../../tests/qtest/migration-helpers.c:116
#9  0x02aa1e891ef6 in check_migration_status (who=0x2aa200f6a20,
goal=0x2aa1e8f0f0e "cancelled", ungoals=0x0) at
../../tests/qtest/migration-helpers.c:132
#10 0x02aa1e892150 in wait_for_migration_status
(who=0x2aa200f6a20, goal=0x2aa1e8f0f0e "cancelled", ungoals=0x0) at
../../tests/qtest/migration-helpers.c:156
#11 0x02aa1e8910fa in test_multifd_tcp_cancel () at
../../tests/qtest/migration-test.c:1379
#12 0x03ff9cc7e608 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x03ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x03ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x03ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#16 0x03ff9cc7e392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#17 0x03ff9cc7eada in g_test_run_suite () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#18 0x03ff9cc7eb10 in g_test_run () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#19 0x02aa1e891578 in main (argc=2, argv=0x3ffe69fece8) at
../../tests/qtest/migration-test.c:1491
[Inferior 1 (process 455775) detached]

===
PROCESS: 456194
linux1456194  455775 85 14:39 pts/001:54:06 ./qemu-system-i386
-qtest unix:/tmp/qtest-455775.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-455775.qmp,id=char0 -mon
chardev=char0,mode=control -display none -accel kvm -accel tcg -name
source,debug-threads=on -m 150M -serial
file:/tmp/migration-test-dmqzpM/src_serial -drive
file=/tmp/migration-test-dmqzpM/bootsect,format=raw -accel qtest
[New LWP 456196]
[New LWP 456197]
[New LWP 456198]
[New 

Re: [PULL 00/18] migration queue

2022-03-08 Thread Dr. David Alan Gilbert
* Philippe Mathieu-Daudé (philippe.mathieu.da...@gmail.com) wrote:
> On 3/3/22 15:46, Peter Maydell wrote:
> > On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
> >  wrote:
> > > 
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > The following changes since commit 
> > > 64ada298b98a51eb2512607f6e6180cb330c47b1:
> > > 
> > >Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' 
> > > into staging (2022-03-02 12:38:46 +)
> > > 
> > > are available in the Git repository at:
> > > 
> > >https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
> > > 
> > > for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
> > > 
> > >migration: Remove load_state_old and minimum_version_id_old 
> > > (2022-03-02 18:20:45 +)
> > > 
> > > 
> > > Migration/HMP/Virtio pull 2022-03-02
> > > 
> > > A bit of a mix this time:
> > >* Minor fixes from myself, Hanna, and Jack
> > >* VNC password rework by Stefan and Fabian
> > >* Postcopy changes from Peter X that are
> > >  the start of a larger series to come
> > >* Removing the prehistoic load_state_old
> > >  code from Peter M
> 
> I'm seeing an error on the s390x runner:
> 
> ▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram:
> assertion failed: (bad == 0) ERROR
> 
>  26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-testERROR
> 78.87s   killed by signal 6 SIGABRT
> 
> https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848

Yeh, thuth mentioned that, it seems to only be s390 which is odd.
I'm not seeing anything obviously architecture dependent in that set, or
for that matter that plays with the ram migration stream much.
Is this reliable enough that someone with a tame s390 could bisect?

Dave

-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PULL 00/18] migration queue

2022-03-08 Thread Philippe Mathieu-Daudé

On 3/3/22 15:46, Peter Maydell wrote:

On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
 wrote:


From: "Dr. David Alan Gilbert" 

The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:

   Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into 
staging (2022-03-02 12:38:46 +)

are available in the Git repository at:

   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b

for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:

   migration: Remove load_state_old and minimum_version_id_old (2022-03-02 
18:20:45 +)


Migration/HMP/Virtio pull 2022-03-02

A bit of a mix this time:
   * Minor fixes from myself, Hanna, and Jack
   * VNC password rework by Stefan and Fabian
   * Postcopy changes from Peter X that are
 the start of a larger series to come
   * Removing the prehistoic load_state_old
 code from Peter M


I'm seeing an error on the s390x runner:

▶  26/547 ERROR:../tests/qtest/migration-test.c:276:check_guests_ram: 
assertion failed: (bad == 0) ERROR


 26/547 qemu:qtest+qtest-i386 / qtest-i386/migration-test 
   ERROR  78.87s   killed by signal 6 SIGABRT


https://app.travis-ci.com/gitlab/qemu-project/qemu/jobs/562515884#L7848



Re: [PULL 00/18] migration queue

2022-03-03 Thread Peter Maydell
On Wed, 2 Mar 2022 at 18:32, Dr. David Alan Gilbert (git)
 wrote:
>
> From: "Dr. David Alan Gilbert" 
>
> The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
>
>   Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into 
> staging (2022-03-02 12:38:46 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b
>
> for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:
>
>   migration: Remove load_state_old and minimum_version_id_old (2022-03-02 
> 18:20:45 +)
>
> 
> Migration/HMP/Virtio pull 2022-03-02
>
> A bit of a mix this time:
>   * Minor fixes from myself, Hanna, and Jack
>   * VNC password rework by Stefan and Fabian
>   * Postcopy changes from Peter X that are
> the start of a larger series to come
>   * Removing the prehistoic load_state_old
> code from Peter M
>
> Signed-off-by: Dr. David Alan Gilbert 
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM



[PULL 00/18] migration queue

2022-03-02 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:

  Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into 
staging (2022-03-02 12:38:46 +)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220302b

for you to fetch changes up to 18621987027b1800f315fb9e29967e7b5398ef6f:

  migration: Remove load_state_old and minimum_version_id_old (2022-03-02 
18:20:45 +)


Migration/HMP/Virtio pull 2022-03-02

A bit of a mix this time:
  * Minor fixes from myself, Hanna, and Jack
  * VNC password rework by Stefan and Fabian
  * Postcopy changes from Peter X that are
the start of a larger series to come
  * Removing the prehistoic load_state_old
code from Peter M

Signed-off-by: Dr. David Alan Gilbert 


Dr. David Alan Gilbert (1):
  clock-vmstate: Add missing END_OF_LIST

Hanna Reitz (1):
  virtiofsd: Let meson check for statx.stx_mnt_id

Jack Wang (1):
  migration/rdma: set the REUSEADDR option for destination

Peter Maydell (1):
  migration: Remove load_state_old and minimum_version_id_old

Peter Xu (11):
  migration: Dump sub-cmd name in loadvm_process_command tp
  migration: Finer grained tracepoints for POSTCOPY_LISTEN
  migration: Tracepoint change in postcopy-run bottom half
  migration: Introduce postcopy channels on dest node
  migration: Dump ramblock and offset too when non-same-page detected
  migration: Add postcopy_thread_create()
  migration: Move static var in ram_block_from_stream() into global
  migration: Enlarge postcopy recovery to capture !-EIO too
  migration: postcopy_pause_fault_thread() never fails
  migration: Add migration_incoming_transport_cleanup()
  tests: Pass in MigrateStart** into test_migrate_start()

Stefan Reiter (3):
  monitor/hmp: add support for flag argument with value
  qapi/monitor: refactor set/expire_password with enums
  qapi/monitor: allow VNC display id in set/expire_password

 docs/devel/migration.rst |  12 +---
 hmp-commands.hx  |  24 
 hw/core/clock-vmstate.c  |   1 +
 hw/ssi/xlnx-versal-ospi.c|   1 -
 include/migration/vmstate.h  |   2 -
 meson.build  |  13 +
 migration/migration.c|  26 +
 migration/migration.h|  48 ++--
 migration/postcopy-ram.c | 108 ++-
 migration/postcopy-ram.h |   4 ++
 migration/ram.c  |  64 +++--
 migration/rdma.c |   7 +++
 migration/savevm.c   |  46 ++-
 migration/trace-events   |   7 +--
 migration/vmstate.c  |   6 --
 monitor/hmp-cmds.c   |  47 ++-
 monitor/hmp.c|  19 ++-
 monitor/monitor-internal.h   |   3 +-
 monitor/qmp-cmds.c   |  49 +---
 qapi/ui.json | 120 +--
 tests/qtest/migration-test.c |  27 +
 tools/virtiofsd/passthrough_ll.c |   2 +-
 22 files changed, 435 insertions(+), 201 deletions(-)