Hello Juan, thanks for the feedback! On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela <quint...@redhat.com> wrote: > > Leonardo Bras <leob...@redhat.com> wrote: > > Implement zero copy send on nocomp_send_write(), by making use of QIOChannel > > writev + flags & flush interface. > > > > Change multifd_send_sync_main() so flush_zero_copy() can be called > > after each iteration in order to make sure all dirty pages are sent before > > a new iteration is started. It will also flush at the beginning and at the > > end of migration. > > > > Also make it return -1 if flush_zero_copy() fails, in order to cancel > > the migration process, and avoid resuming the guest in the target host > > without receiving all current RAM. > > > > This will work fine on RAM migration because the RAM pages are not usually > > freed, > > and there is no problem on changing the pages content between > > writev_zero_copy() and > > the actual sending of the buffer, because this change will dirty the page > > and > > cause it to be re-sent on a next iteration anyway. > > > > A lot of locked memory may be needed in order to use multid migration > ^^^^^^ > multifd. > > I can fix it on the commit.
No worries, fixed for v9. > > > > @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters > > *params, Error **errp) > > error_prepend(errp, "Invalid mapping given for > > block-bitmap-mapping: "); > > return false; > > } > > - > > +#ifdef CONFIG_LINUX > > + if (params->zero_copy_send && > > + (!migrate_use_multifd() || > > + params->multifd_compression != MULTIFD_COMPRESSION_NONE || > > + (params->tls_creds && *params->tls_creds))) { > > + error_setg(errp, > > + "Zero copy only available for non-compressed non-TLS > > multifd migration"); > > + return false; > > + } > > +#endif > > return true; > > } > > Test is long, but it is exactly what we need. Good. Thanks! > > > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index 43998ad117..2d68b9cf4f 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void) > > multifd_send_state = NULL; > > } > > > > -void multifd_send_sync_main(QEMUFile *f) > > +int multifd_send_sync_main(QEMUFile *f) > > { > > int i; > > + bool flush_zero_copy; > > > > if (!migrate_use_multifd()) { > > - return; > > + return 0; > > } > > if (multifd_send_state->pages->num) { > > if (multifd_send_pages(f) < 0) { > > error_report("%s: multifd_send_pages fail", __func__); > > - return; > > + return 0; > > } > > } > > + > > + /* > > + * When using zero-copy, it's necessary to flush after each iteration > > to > > + * make sure pages from earlier iterations don't end up replacing newer > > + * pages. > > + */ > > + flush_zero_copy = migrate_use_zero_copy_send(); > > + > > for (i = 0; i < migrate_multifd_channels(); i++) { > > MultiFDSendParams *p = &multifd_send_state->params[i]; > > > > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f) > > if (p->quit) { > > error_report("%s: channel %d has already quit", __func__, i); > > qemu_mutex_unlock(&p->mutex); > > - return; > > + return 0; > > } > > > > p->packet_num = multifd_send_state->packet_num++; > > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f) > > ram_counters.transferred += p->packet_len; > > qemu_mutex_unlock(&p->mutex); > > qemu_sem_post(&p->sem); > > + > > + if (flush_zero_copy) { > > + int ret; > > + Error *err = NULL; > > + > > + ret = qio_channel_flush(p->c, &err); > > + if (ret < 0) { > > + error_report_err(err); > > + return -1; > > + } > > + } > > } > > for (i = 0; i < migrate_multifd_channels(); i++) { > > MultiFDSendParams *p = &multifd_send_state->params[i]; > > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f) > > qemu_sem_wait(&p->sem_sync); > > } > > trace_multifd_send_sync_main(multifd_send_state->packet_num); > > + > > + return 0; > > } > > We are leaving pages is flight for potentially a lot of time. I *think* > that we can sync shorter than that. > > > static void *multifd_send_thread(void *opaque) > > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque) > > p->iov[0].iov_len = p->packet_len; > > p->iov[0].iov_base = p->packet; > > > > - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, > > - &local_err); > > + ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, > > NULL, > > + 0, p->write_flags, > > &local_err); > > if (ret != 0) { > > break; > > } > > I still think that it would be better to have a sync here in each > thread. I don't know the size, but once every couple megabytes of RAM > or so. This seems a good idea, since the first iteration may take a while, and we may take a lot of time to fail if something goes wrong with zerocopy at the start of iteration 1. On the other hand, flushing takes some time, and doing it a lot may take away some of the performance improvements. Maybe we could move the flushing to a thread of it's own, if it becomes a problem. > > I did a change on: > > commit d48c3a044537689866fe44e65d24c7d39a68868a > Author: Juan Quintela <quint...@redhat.com> > Date: Fri Nov 19 15:35:58 2021 +0100 > > multifd: Use a single writev on the send side > > Until now, we wrote the packet header with write(), and the rest of the > pages with writev(). Just increase the size of the iovec and do a > single writev(). > > Signed-off-by: Juan Quintela <quint...@redhat.com> > Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > And now we need to "perserve" this header until we do the sync, > otherwise we are overwritting it with other things. Yeah, that is a problem I faced on non-multifd migration, and a reason why I choose to implement directly in multifd. > > What testing have you done after this commit? Not much, but this will probably become an issue with bigger guests. > > Notice that it is not _complicated_ to fix it, I will try to come with > some idea on monday, but basically is having an array of buffers for > each thread, and store them until we call a sync(). That will probably work, we can use MultiFDRecvParams->packet as this array, right? We can start with a somehow small buffer size and grow if it ever gets full. (Full means a sync + g_try_malloc0 + g_free) By my calculations, it should take 1,8kB each element on a 4k PAGESIZE. What do you think? Best regards, Leo