Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

Leonardo Bras Soares Passos Mon, 21 Feb 2022 11:44:09 -0800

Hello Juan, thanks for the feedback!

On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela <quint...@redhat.com> wrote:
>
> Leonardo Bras <leob...@redhat.com> wrote:
> > Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
> > writev + flags & flush interface.
> >
> > Change multifd_send_sync_main() so flush_zero_copy() can be called
> > after each iteration in order to make sure all dirty pages are sent before
> > a new iteration is started. It will also flush at the beginning and at the
> > end of migration.
> >
> > Also make it return -1 if flush_zero_copy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually 
> > freed,
> > and there is no problem on changing the pages content between 
> > writev_zero_copy() and
> > the actual sending of the buffer, because this change will dirty the page 
> > and
> > cause it to be re-sent on a next iteration anyway.
> >
> > A lot of locked memory may be needed in order to use multid migration
>                                                        ^^^^^^
> multifd.
>
> I can fix it on the commit.


No worries, fixed for v9.

>
>
> > @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters 
> > *params, Error **errp)
> >          error_prepend(errp, "Invalid mapping given for 
> > block-bitmap-mapping: ");
> >          return false;
> >      }
> > -
> > +#ifdef CONFIG_LINUX
> > +    if (params->zero_copy_send &&
> > +        (!migrate_use_multifd() ||
> > +         params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > +         (params->tls_creds && *params->tls_creds))) {
> > +        error_setg(errp,
> > +                   "Zero copy only available for non-compressed non-TLS 
> > multifd migration");
> > +        return false;
> > +    }
> > +#endif
> >      return true;
> >  }
>
> Test is long, but it is exactly what we need.  Good.

Thanks!


>
>
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 43998ad117..2d68b9cf4f 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
> >      multifd_send_state = NULL;
> >  }
> >
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f)
> >  {
> >      int i;
> > +    bool flush_zero_copy;
> >
> >      if (!migrate_use_multifd()) {
> > -        return;
> > +        return 0;
> >      }
> >      if (multifd_send_state->pages->num) {
> >          if (multifd_send_pages(f) < 0) {
> >              error_report("%s: multifd_send_pages fail", __func__);
> > -            return;
> > +            return 0;
> >          }
> >      }
> > +
> > +    /*
> > +     * When using zero-copy, it's necessary to flush after each iteration 
> > to
> > +     * make sure pages from earlier iterations don't end up replacing newer
> > +     * pages.
> > +     */
> > +    flush_zero_copy = migrate_use_zero_copy_send();
> > +
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> >
> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >          if (p->quit) {
> >              error_report("%s: channel %d has already quit", __func__, i);
> >              qemu_mutex_unlock(&p->mutex);
> > -            return;
> > +            return 0;
> >          }
> >
> >          p->packet_num = multifd_send_state->packet_num++;
> > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
> >          ram_counters.transferred += p->packet_len;
> >          qemu_mutex_unlock(&p->mutex);
> >          qemu_sem_post(&p->sem);
> > +
> > +        if (flush_zero_copy) {
> > +            int ret;
> > +            Error *err = NULL;
> > +
> > +            ret = qio_channel_flush(p->c, &err);
> > +            if (ret < 0) {
> > +                error_report_err(err);
> > +                return -1;
> > +            }
> > +        }
> >      }
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
> >          qemu_sem_wait(&p->sem_sync);
> >      }
> >      trace_multifd_send_sync_main(multifd_send_state->packet_num);
> > +
> > +    return 0;
> >  }
>
> We are leaving pages is flight for potentially a lot of time. I *think*
> that we can sync shorter than that.
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
> >              p->iov[0].iov_len = p->packet_len;
> >              p->iov[0].iov_base = p->packet;
> >
> > -            ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> > -                                         &local_err);
> > +            ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> > NULL,
> > +                                              0, p->write_flags, 
> > &local_err);
> >              if (ret != 0) {
> >                  break;
> >              }
>
> I still think that it would be better to have a sync here in each
> thread.  I don't know the size, but once every couple megabytes of RAM
> or so.

This seems a good idea, since the first iteration may take a while,
and we may take a lot of time to fail if something goes wrong with
zerocopy at the start of iteration 1.

On the other hand, flushing takes some time, and doing it a lot may
take away some of the performance improvements.

Maybe we could move the flushing to a thread of it's own, if it
becomes a problem.


>
> I did a change on:
>
> commit d48c3a044537689866fe44e65d24c7d39a68868a
> Author: Juan Quintela <quint...@redhat.com>
> Date:   Fri Nov 19 15:35:58 2021 +0100
>
>     multifd: Use a single writev on the send side
>
>     Until now, we wrote the packet header with write(), and the rest of the
>     pages with writev().  Just increase the size of the iovec and do a
>     single writev().
>
>     Signed-off-by: Juan Quintela <quint...@redhat.com>
>     Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
>
> And now we need to "perserve" this header until we do the sync,
> otherwise we are overwritting it with other things.

Yeah, that is a problem I faced on non-multifd migration, and a reason
why I choose to implement directly in multifd.

>
> What testing have you done after this commit?

Not much, but this will probably become an issue with bigger guests.

>
> Notice that it is not _complicated_ to fix it, I will try to come with
> some idea on monday, but basically is having an array of buffers for
> each thread, and store them until we call a sync().

That will probably work, we can use MultiFDRecvParams->packet as this
array, right?
We can start with a somehow small buffer size and grow if it ever gets full.
(Full means a sync + g_try_malloc0 + g_free)

By my calculations, it should take 1,8kB each element on a 4k PAGESIZE.

What do you think?

Best regards,
Leo

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

Reply via email to