On Mon, Feb 21, 2022 at 4:41 PM Leonardo Bras Soares Passos <leob...@redhat.com> wrote: > > Hello Juan, thanks for the feedback! > > On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela <quint...@redhat.com> wrote: > > > > Leonardo Bras <leob...@redhat.com> wrote: > > > Implement zero copy send on nocomp_send_write(), by making use of > > > QIOChannel > > > writev + flags & flush interface. > > > > > > Change multifd_send_sync_main() so flush_zero_copy() can be called > > > after each iteration in order to make sure all dirty pages are sent before > > > a new iteration is started. It will also flush at the beginning and at the > > > end of migration. > > > > > > Also make it return -1 if flush_zero_copy() fails, in order to cancel > > > the migration process, and avoid resuming the guest in the target host > > > without receiving all current RAM. > > > > > > This will work fine on RAM migration because the RAM pages are not > > > usually freed, > > > and there is no problem on changing the pages content between > > > writev_zero_copy() and > > > the actual sending of the buffer, because this change will dirty the page > > > and > > > cause it to be re-sent on a next iteration anyway. > > > > > > A lot of locked memory may be needed in order to use multid migration > > ^^^^^^ > > multifd. > > > > I can fix it on the commit. > > No worries, fixed for v9. > > > > > > > > @@ -1479,7 +1479,16 @@ static bool > > > migrate_params_check(MigrationParameters *params, Error **errp) > > > error_prepend(errp, "Invalid mapping given for > > > block-bitmap-mapping: "); > > > return false; > > > } > > > - > > > +#ifdef CONFIG_LINUX > > > + if (params->zero_copy_send && > > > + (!migrate_use_multifd() || > > > + params->multifd_compression != MULTIFD_COMPRESSION_NONE || > > > + (params->tls_creds && *params->tls_creds))) { > > > + error_setg(errp, > > > + "Zero copy only available for non-compressed non-TLS > > > multifd migration"); > > > + return false; > > > + } > > > +#endif > > > return true; > > > } > > > > Test is long, but it is exactly what we need. Good. > > Thanks! > > > > > > > > > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > > index 43998ad117..2d68b9cf4f 100644 > > > --- a/migration/multifd.c > > > +++ b/migration/multifd.c > > > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void) > > > multifd_send_state = NULL; > > > } > > > > > > -void multifd_send_sync_main(QEMUFile *f) > > > +int multifd_send_sync_main(QEMUFile *f) > > > { > > > int i; > > > + bool flush_zero_copy; > > > > > > if (!migrate_use_multifd()) { > > > - return; > > > + return 0; > > > } > > > if (multifd_send_state->pages->num) { > > > if (multifd_send_pages(f) < 0) { > > > error_report("%s: multifd_send_pages fail", __func__); > > > - return; > > > + return 0; > > > } > > > } > > > + > > > + /* > > > + * When using zero-copy, it's necessary to flush after each > > > iteration to > > > + * make sure pages from earlier iterations don't end up replacing > > > newer > > > + * pages. > > > + */ > > > + flush_zero_copy = migrate_use_zero_copy_send(); > > > + > > > for (i = 0; i < migrate_multifd_channels(); i++) { > > > MultiFDSendParams *p = &multifd_send_state->params[i]; > > > > > > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f) > > > if (p->quit) { > > > error_report("%s: channel %d has already quit", __func__, i); > > > qemu_mutex_unlock(&p->mutex); > > > - return; > > > + return 0; > > > } > > > > > > p->packet_num = multifd_send_state->packet_num++; > > > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f) > > > ram_counters.transferred += p->packet_len; > > > qemu_mutex_unlock(&p->mutex); > > > qemu_sem_post(&p->sem); > > > + > > > + if (flush_zero_copy) { > > > + int ret; > > > + Error *err = NULL; > > > + > > > + ret = qio_channel_flush(p->c, &err); > > > + if (ret < 0) { > > > + error_report_err(err); > > > + return -1; > > > + } > > > + } > > > } > > > for (i = 0; i < migrate_multifd_channels(); i++) { > > > MultiFDSendParams *p = &multifd_send_state->params[i]; > > > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f) > > > qemu_sem_wait(&p->sem_sync); > > > } > > > trace_multifd_send_sync_main(multifd_send_state->packet_num); > > > + > > > + return 0; > > > } > > > > We are leaving pages is flight for potentially a lot of time. I *think* > > that we can sync shorter than that. > > > > > static void *multifd_send_thread(void *opaque) > > > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque) > > > p->iov[0].iov_len = p->packet_len; > > > p->iov[0].iov_base = p->packet; > > > > > > - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, > > > - &local_err); > > > + ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, > > > NULL, > > > + 0, p->write_flags, > > > &local_err); > > > if (ret != 0) { > > > break; > > > } > > > > I still think that it would be better to have a sync here in each > > thread. I don't know the size, but once every couple megabytes of RAM > > or so. > > This seems a good idea, since the first iteration may take a while, > and we may take a lot of time to fail if something goes wrong with > zerocopy at the start of iteration 1. > > On the other hand, flushing takes some time, and doing it a lot may > take away some of the performance improvements. > > Maybe we could move the flushing to a thread of it's own, if it > becomes a problem. > > > > > > I did a change on: > > > > commit d48c3a044537689866fe44e65d24c7d39a68868a > > Author: Juan Quintela <quint...@redhat.com> > > Date: Fri Nov 19 15:35:58 2021 +0100 > > > > multifd: Use a single writev on the send side > > > > Until now, we wrote the packet header with write(), and the rest of the > > pages with writev(). Just increase the size of the iovec and do a > > single writev(). > > > > Signed-off-by: Juan Quintela <quint...@redhat.com> > > Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > > > And now we need to "perserve" this header until we do the sync, > > otherwise we are overwritting it with other things. > > Yeah, that is a problem I faced on non-multifd migration, and a reason > why I choose to implement directly in multifd. > > > > > What testing have you done after this commit? > > Not much, but this will probably become an issue with bigger guests. > > > > > Notice that it is not _complicated_ to fix it, I will try to come with > > some idea on monday, but basically is having an array of buffers for > > each thread, and store them until we call a sync(). > > That will probably work, we can use MultiFDRecvParams->packet as this > array, right? > We can start with a somehow small buffer size and grow if it ever gets full. > (Full means a sync + g_try_malloc0 + g_free)
Or maybe use the array size as a size limiter before flushing, so it will always flush after BUFFSIZE headers are sent. > > By my calculations, it should take 1,8kB each element on a 4k PAGESIZE. > > What do you think? > > Best regards, > Leo