RE: [PATCH 3/6] io/channel-rdma: support working in coroutine

2024-06-07 Thread Gonglei (Arei)
Hi Daniel,

> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Friday, June 7, 2024 5:04 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; pet...@redhat.com; yu.zh...@ionos.com;
> mgal...@akamai.com; elmar.ger...@ionos.com; zhengchuan
> ; arm...@redhat.com; lizhij...@fujitsu.com;
> pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> 
> Subject: Re: [PATCH 3/6] io/channel-rdma: support working in coroutine
> 
> On Tue, Jun 04, 2024 at 08:14:09PM +0800, Gonglei wrote:
> > From: Jialin Wang 
> >
> > It is not feasible to obtain RDMA completion queue notifications
> > through poll/ppoll on the rsocket fd. Therefore, we create a thread
> > named rpoller for each rsocket fd and two eventfds: pollin_eventfd and
> > pollout_eventfd.
> >
> > When using io_create_watch or io_set_aio_fd_handler waits for POLLIN
> > or POLLOUT events, it will actually poll/ppoll on the pollin_eventfd
> > and pollout_eventfd instead of the rsocket fd.
> >
> > The rpoller rpoll() on the rsocket fd to receive POLLIN and POLLOUT
> > events.
> > When a POLLIN event occurs, the rpoller write the pollin_eventfd, and
> > then poll/ppoll will return the POLLIN event.
> > When a POLLOUT event occurs, the rpoller read the pollout_eventfd, and
> > then poll/ppoll will return the POLLOUT event.
> >
> > For a non-blocking rsocket fd, if rread/rwrite returns EAGAIN, it will
> > read/write the pollin/pollout_eventfd, preventing poll/ppoll from
> > returning POLLIN/POLLOUT events.
> >
> > Known limitations:
> >
> >   For a blocking rsocket fd, if we use io_create_watch to wait for
> >   POLLIN or POLLOUT events, since the rsocket fd is blocking, we
> >   cannot determine when it is not ready to read/write as we can with
> >   non-blocking fds. Therefore, when an event occurs, it will occurs
> >   always, potentially leave the qemu hanging. So we need be cautious
> >   to avoid hanging when using io_create_watch .
> >
> > Luckily, channel-rdma works well in coroutines :)
> >
> > Signed-off-by: Jialin Wang 
> > Signed-off-by: Gonglei 
> > ---
> >  include/io/channel-rdma.h |  15 +-
> >  io/channel-rdma.c | 363
> +-
> >  2 files changed, 376 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/io/channel-rdma.h b/include/io/channel-rdma.h
> > index 8cab2459e5..cb56127d76 100644
> > --- a/include/io/channel-rdma.h
> > +++ b/include/io/channel-rdma.h
> > @@ -47,6 +47,18 @@ struct QIOChannelRDMA {
> >  socklen_t localAddrLen;
> >  struct sockaddr_storage remoteAddr;
> >  socklen_t remoteAddrLen;
> > +
> > +/* private */
> > +
> > +/* qemu g_poll/ppoll() POLLIN event on it */
> > +int pollin_eventfd;
> > +/* qemu g_poll/ppoll() POLLOUT event on it */
> > +int pollout_eventfd;
> > +
> > +/* the index in the rpoller's fds array */
> > +int index;
> > +/* rpoller will rpoll() rpoll_events on the rsocket fd */
> > +short int rpoll_events;
> >  };
> >
> >  /**
> > @@ -147,6 +159,7 @@ void
> qio_channel_rdma_listen_async(QIOChannelRDMA *ioc, InetSocketAddress
> *addr,
> >   *
> >   * Returns: the new client channel, or NULL on error
> >   */
> > -QIOChannelRDMA *qio_channel_rdma_accept(QIOChannelRDMA *ioc,
> Error
> > **errp);
> > +QIOChannelRDMA *coroutine_mixed_fn
> qio_channel_rdma_accept(QIOChannelRDMA *ioc,
> > +
> Error
> > +**errp);
> >
> >  #endif /* QIO_CHANNEL_RDMA_H */
> > diff --git a/io/channel-rdma.c b/io/channel-rdma.c index
> > 92c362df52..9792add5cf 100644
> > --- a/io/channel-rdma.c
> > +++ b/io/channel-rdma.c
> > @@ -23,10 +23,15 @@
> >
> >  #include "qemu/osdep.h"
> >  #include "io/channel-rdma.h"
> > +#include "io/channel-util.h"
> > +#include "io/channel-watch.h"
> >  #include "io/channel.h"
> >  #include "qapi/clone-visitor.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-visit-sockets.h"
> > +#include "qemu/atomic.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/thread.h"
> >  #include "trace.h"
> >  #include 
> >  #include 
> > @@ -39,11 +44,274 @@
> >  #include 
> >  #include 
> >
> > +typedef enum {
> > +CLEAR_POLLIN,
> > + 

RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-06-07 Thread Gonglei (Arei)
Hi,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Thursday, June 6, 2024 5:19 AM
> To: Dr. David Alan Gilbert 
> Cc: Michael Galaxy ; zhengchuan
> ; Gonglei (Arei) ;
> Daniel P. Berrangé ; Markus Armbruster
> ; Yu Zhang ; Zhijian Li (Fujitsu)
> ; Jinpu Wang ; Elmar Gerdes
> ; qemu-devel@nongnu.org; Yuval Shaia
> ; Kevin Wolf ; Prasanna
> Kumar Kalever ; Cornelia Huck
> ; Michael Roth ; Prasanna
> Kumar Kalever ; integrat...@gluster.org; Paolo
> Bonzini ; qemu-bl...@nongnu.org;
> de...@lists.libvirt.org; Hanna Reitz ; Michael S. Tsirkin
> ; Thomas Huth ; Eric Blake
> ; Song Gao ; Marc-André
> Lureau ; Alex Bennée
> ; Wainer dos Santos Moschetta
> ; Beraldo Leal ; Pannengyuan
> ; Xiexiangyou 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Wed, Jun 05, 2024 at 08:48:28PM +, Dr. David Alan Gilbert wrote:
> > > > I just noticed this thread; some random notes from a somewhat
> > > > fragmented memory of this:
> > > >
> > > >   a) Long long ago, I also tried rsocket;
> > > >
> https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html
> > > >  as I remember the library was quite flaky at the time.
> > >
> > > Hmm interesting.  There also looks like a thread doing rpoll().
> >
> > Yeh, I can't actually remember much more about what I did back then!
> 
> Heh, that's understandable and fair. :)
> 
> > > I hope Lei and his team has tested >4G mem, otherwise definitely
> > > worth checking.  Lei also mentioned there're rsocket bugs they found
> > > in the cover letter, but not sure what's that about.
> >
> > It would probably be a good idea to keep track of what bugs are in
> > flight with it, and try it on a few RDMA cards to see what problems
> > get triggered.
> > I think I reported a few at the time, but I gave up after feeling it
> > was getting very hacky.
> 
> Agreed.  Maybe we can have a list of that in the cover letter or even QEMU's
> migration/rmda doc page.
> 
> Lei, if you think that makes sense please do so in your upcoming posts.
> There'll need to have a list of things you encountered in the kernel driver 
> and
> it'll be even better if there're further links to read on each problem.
> 
OK, no problem. There are two bugs:

Bug 1:

https://github.com/linux-rdma/rdma-core/commit/23985e25aebb559b761872313f8cab4e811c5a3d#diff-5ddbf83c6f021688166096ca96c9bba874dffc3cab88ded2e9d8b2176faa084cR3302-R3303

his commit introduces a bug that causes QEMU suspension.
When the timeout parameter of the rpoll is not -1 or 0, the program is 
suspended occasionally.

Problem analysis:
During the first rpoll,
In line 3297, rs_poll_enter () performs pollcnt++. In this case, the value of 
pollcnt is 1.
In line 3302, timeout expires and the function exits. Note that rs_poll_exit () 
is not --pollcnt here.
In this case, the value of pollcnt is 1.
During the second rpoll, pollcnt++ is performed in line 3297 rs_poll_enter (). 
In this case, the value of pollcnt is 2.
If no timeout expires and the poll return value is greater than 0, the 
rs_poll_stop () function is executed. Because the if (--pollcnt) condition is 
false, suspendpoll = 1 is executed.
Go back to the do while loop inside rpoll, again rs_poll_enter () now if 
(suspendpoll) condition is true, execute pthread_yield (); and return -EBUSY, 
Then, the do while loop in the rpoll is returned. Because the if (rs_poll_enter 
()) condition is true, the rs_poll_enter () function is executed again after 
the continue operation. As a result, the program is suspended.

Root cause: In line 3302, rs_poll_exit () is not executed before the timeout 
expires function exits.


Bug 2:

In rsocket.c, there is a receive queue int accept_queue[2] implemented by 
socketpair. The listen_svc thread in rsocket.c is responsible for receiving 
connections and writing them to the accept_queue[1]. When raccept () is called, 
a connection is received from accept_queue[0].
In the test case, qio_channel_wait(QIO_CHANNEL(lioc), G_IO_IN); waits for a 
readable event (waiting for a connection), rpoll () checks if accept_queue[0] 
has a readable event, However, this poll does not poll accept_queue[0]. After 
the timeout expires, rpoll () obtains the readable event of accept_queue[0] 
from rs_poll_arm again.

Impaction: 
The accept operation can be performed only after 5000 ms. Of course, we can 
shorten this time by echoing the millisecond time > 
/etc/rdma/rsocket/wake_up_interval.


Regards,
-Gonglei

> > > >
> > > >   e) Someone made a good suggestion (sorry can't remember who) -
> that the
> > > >  RDMA migration structure was the wrong way around - it should
> be the
&

RE: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-06-07 Thread Gonglei (Arei)


> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, June 5, 2024 10:19 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; yu.zh...@ionos.com; mgal...@akamai.com;
> elmar.ger...@ionos.com; zhengchuan ;
> berra...@redhat.com; arm...@redhat.com; lizhij...@fujitsu.com;
> pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> ; Fabiano Rosas 
> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
> 
> On Wed, Jun 05, 2024 at 10:09:43AM +, Gonglei (Arei) wrote:
> > Hi Peter,
> >
> > > -Original Message-
> > > From: Peter Xu [mailto:pet...@redhat.com]
> > > Sent: Wednesday, June 5, 2024 3:32 AM
> > > To: Gonglei (Arei) 
> > > Cc: qemu-devel@nongnu.org; yu.zh...@ionos.com;
> mgal...@akamai.com;
> > > elmar.ger...@ionos.com; zhengchuan ;
> > > berra...@redhat.com; arm...@redhat.com; lizhij...@fujitsu.com;
> > > pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> > > ; linux-r...@vger.kernel.org; lixiao (H)
> > > ; jinpu.w...@ionos.com; Wangjialin
> > > ; Fabiano Rosas 
> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on
> > > rsocket API
> > >
> > > Hi, Lei, Jialin,
> > >
> > > Thanks a lot for working on this!
> > >
> > > I think we'll need to wait a bit on feedbacks from Jinpu and his
> > > team on RDMA side, also Daniel for iochannels.  Also, please
> > > remember to copy Fabiano Rosas in any relevant future posts.  We'd
> > > also like to know whether he has any comments too.  I have him copied in
> this reply.
> > >
> > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote:
> > > > From: Jialin Wang 
> > > >
> > > > Hi,
> > > >
> > > > This patch series attempts to refactor RDMA live migration by
> > > > introducing a new QIOChannelRDMA class based on the rsocket API.
> > > >
> > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket
> > > > API that is a 1-1 match of the normal kernel 'sockets' API, which
> > > > hides the detail of rdma protocol into rsocket and allows us to
> > > > add support for some modern features like multifd more easily.
> > > >
> > > > Here is the previous discussion on refactoring RDMA live migration
> > > > using the rsocket API:
> > > >
> > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@l
> > > > inar
> > > > o.org/
> > > >
> > > > We have encountered some bugs when using rsocket and plan to
> > > > submit them to the rdma-core community.
> > > >
> > > > In addition, the use of rsocket makes our programming more
> > > > convenient, but it must be noted that this method introduces
> > > > multiple memory copies, which can be imagined that there will be a
> > > > certain performance degradation, hoping that friends with RDMA
> > > > network cards can help verify,
> > > thank you!
> > >
> > > It'll be good to elaborate if you tested it in-house. What people
> > > should expect on the numbers exactly?  Is that okay from Huawei's POV?
> > >
> > > Besides that, the code looks pretty good at a first glance to me.
> > > Before others chim in, here're some high level comments..
> > >
> > > Firstly, can we avoid using coroutine when listen()?  Might be
> > > relevant when I see that rdma_accept_incoming_migration() runs in a
> > > loop to do raccept(), but would that also hang the qemu main loop
> > > even with the coroutine, before all channels are ready?  I'm not a
> > > coroutine person, but I think the hope is that we can make dest QEMU
> > > run in a thread in the future just like the src QEMU, so the less 
> > > coroutine
> the better in this path.
> > >
> >
> > Because rsocket is set to non-blocking, raccept will return EAGAIN
> > when no connection is received, coroutine will yield, and will not hang the
> qemu main loop.
> 
> Ah that's ok.  And also I just noticed it may not be a big deal either as 
> long as
> we're before migration_incoming_process().
> 
> I'm wondering whether it can do it similarly like what we do with sockets in
> qio_net_listener_set_client_func_full().  After all, rsocket wants to mimic 
> the
> socket API.  It'll make sense if rsocket code tries to match with socket, 

RE: [PATCH 3/6] io/channel-rdma: support working in coroutine

2024-06-07 Thread Gonglei (Arei)


> -Original Message-
> From: Haris Iqbal [mailto:haris.iq...@ionos.com]
> Sent: Thursday, June 6, 2024 9:35 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; pet...@redhat.com; yu.zh...@ionos.com;
> mgal...@akamai.com; elmar.ger...@ionos.com; zhengchuan
> ; berra...@redhat.com; arm...@redhat.com;
> lizhij...@fujitsu.com; pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> 
> Subject: Re: [PATCH 3/6] io/channel-rdma: support working in coroutine
> 
> On Tue, Jun 4, 2024 at 2:14 PM Gonglei  wrote:
> >
> > From: Jialin Wang 
> >
> > It is not feasible to obtain RDMA completion queue notifications
> > through poll/ppoll on the rsocket fd. Therefore, we create a thread
> > named rpoller for each rsocket fd and two eventfds: pollin_eventfd and
> > pollout_eventfd.
> >
> > When using io_create_watch or io_set_aio_fd_handler waits for POLLIN
> > or POLLOUT events, it will actually poll/ppoll on the pollin_eventfd
> > and pollout_eventfd instead of the rsocket fd.
> >
> > The rpoller rpoll() on the rsocket fd to receive POLLIN and POLLOUT
> > events.
> > When a POLLIN event occurs, the rpoller write the pollin_eventfd, and
> > then poll/ppoll will return the POLLIN event.
> > When a POLLOUT event occurs, the rpoller read the pollout_eventfd, and
> > then poll/ppoll will return the POLLOUT event.
> >
> > For a non-blocking rsocket fd, if rread/rwrite returns EAGAIN, it will
> > read/write the pollin/pollout_eventfd, preventing poll/ppoll from
> > returning POLLIN/POLLOUT events.
> >
> > Known limitations:
> >
> >   For a blocking rsocket fd, if we use io_create_watch to wait for
> >   POLLIN or POLLOUT events, since the rsocket fd is blocking, we
> >   cannot determine when it is not ready to read/write as we can with
> >   non-blocking fds. Therefore, when an event occurs, it will occurs
> >   always, potentially leave the qemu hanging. So we need be cautious
> >   to avoid hanging when using io_create_watch .
> >
> > Luckily, channel-rdma works well in coroutines :)
> >
> > Signed-off-by: Jialin Wang 
> > Signed-off-by: Gonglei 
> > ---
> >  include/io/channel-rdma.h |  15 +-
> >  io/channel-rdma.c | 363
> +-
> >  2 files changed, 376 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/io/channel-rdma.h b/include/io/channel-rdma.h
> > index 8cab2459e5..cb56127d76 100644
> > --- a/include/io/channel-rdma.h
> > +++ b/include/io/channel-rdma.h
> > @@ -47,6 +47,18 @@ struct QIOChannelRDMA {
> >  socklen_t localAddrLen;
> >  struct sockaddr_storage remoteAddr;
> >  socklen_t remoteAddrLen;
> > +
> > +/* private */
> > +
> > +/* qemu g_poll/ppoll() POLLIN event on it */
> > +int pollin_eventfd;
> > +/* qemu g_poll/ppoll() POLLOUT event on it */
> > +int pollout_eventfd;
> > +
> > +/* the index in the rpoller's fds array */
> > +int index;
> > +/* rpoller will rpoll() rpoll_events on the rsocket fd */
> > +short int rpoll_events;
> >  };
> >
> >  /**
> > @@ -147,6 +159,7 @@ void
> qio_channel_rdma_listen_async(QIOChannelRDMA *ioc, InetSocketAddress
> *addr,
> >   *
> >   * Returns: the new client channel, or NULL on error
> >   */
> > -QIOChannelRDMA *qio_channel_rdma_accept(QIOChannelRDMA *ioc,
> Error
> > **errp);
> > +QIOChannelRDMA *coroutine_mixed_fn
> qio_channel_rdma_accept(QIOChannelRDMA *ioc,
> > +
> Error
> > +**errp);
> >
> >  #endif /* QIO_CHANNEL_RDMA_H */
> > diff --git a/io/channel-rdma.c b/io/channel-rdma.c index
> > 92c362df52..9792add5cf 100644
> > --- a/io/channel-rdma.c
> > +++ b/io/channel-rdma.c
> > @@ -23,10 +23,15 @@
> >
> >  #include "qemu/osdep.h"
> >  #include "io/channel-rdma.h"
> > +#include "io/channel-util.h"
> > +#include "io/channel-watch.h"
> >  #include "io/channel.h"
> >  #include "qapi/clone-visitor.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-visit-sockets.h"
> > +#include "qemu/atomic.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/thread.h"
> >  #include "trace.h"
> >  #include 
> >  #include 
> > @@ -39,11 +44,274 @@
> >  #include 
> >  #include 
> >
> > +typedef enum {
> > +CLEAR_POLLIN,

RE: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-06-07 Thread Gonglei (Arei)


> -Original Message-
> From: Jinpu Wang [mailto:jinpu.w...@ionos.com]
> Sent: Friday, June 7, 2024 1:54 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; pet...@redhat.com; yu.zh...@ionos.com;
> mgal...@akamai.com; elmar.ger...@ionos.com; zhengchuan
> ; berra...@redhat.com; arm...@redhat.com;
> lizhij...@fujitsu.com; pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; Wangjialin 
> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
> 
> Hi Gonglei, hi folks on the list,
> 
> On Tue, Jun 4, 2024 at 2:14 PM Gonglei  wrote:
> >
> > From: Jialin Wang 
> >
> > Hi,
> >
> > This patch series attempts to refactor RDMA live migration by
> > introducing a new QIOChannelRDMA class based on the rsocket API.
> >
> > The /usr/include/rdma/rsocket.h provides a higher level rsocket API
> > that is a 1-1 match of the normal kernel 'sockets' API, which hides
> > the detail of rdma protocol into rsocket and allows us to add support
> > for some modern features like multifd more easily.
> >
> > Here is the previous discussion on refactoring RDMA live migration
> > using the rsocket API:
> >
> > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar
> > o.org/
> >
> > We have encountered some bugs when using rsocket and plan to submit
> > them to the rdma-core community.
> >
> > In addition, the use of rsocket makes our programming more convenient,
> > but it must be noted that this method introduces multiple memory
> > copies, which can be imagined that there will be a certain performance
> > degradation, hoping that friends with RDMA network cards can help verify,
> thank you!
> First thx for the effort, we are running migration tests on our IB fabric, 
> different
> generation of HCA from mellanox, the migration works ok, there are a few
> failures,  Yu will share the result later separately.
> 

Thank you so much. 

> The one blocker for the change is the old implementation and the new rsocket
> implementation; they don't talk to each other due to the effect of different 
> wire
> protocol during connection establishment.
> eg the old RDMA migration has special control message during the migration
> flow, which rsocket use a different control message, so there lead to no way 
> to
> migrate VM using rdma transport pre to the rsocket patchset to a new version
> with rsocket implementation.
> 
> Probably we should keep both implementation for a while, mark the old
> implementation as deprecated, and promote the new implementation, and
> high light in doc, they are not compatible.
> 

IMO It makes sense. What's your opinion? @Peter.


Regards,
-Gonglei

> Regards!
> Jinpu
> 
> 
> 
> >
> > Jialin Wang (6):
> >   migration: remove RDMA live migration temporarily
> >   io: add QIOChannelRDMA class
> >   io/channel-rdma: support working in coroutine
> >   tests/unit: add test-io-channel-rdma.c
> >   migration: introduce new RDMA live migration
> >   migration/rdma: support multifd for RDMA migration
> >
> >  docs/rdma.txt |  420 ---
> >  include/io/channel-rdma.h |  165 ++
> >  io/channel-rdma.c |  798 ++
> >  io/meson.build|1 +
> >  io/trace-events   |   14 +
> >  meson.build   |6 -
> >  migration/meson.build |3 +-
> >  migration/migration-stats.c   |5 +-
> >  migration/migration-stats.h   |4 -
> >  migration/migration.c |   13 +-
> >  migration/migration.h |9 -
> >  migration/multifd.c   |   10 +
> >  migration/options.c   |   16 -
> >  migration/options.h   |2 -
> >  migration/qemu-file.c |1 -
> >  migration/ram.c   |   90 +-
> >  migration/rdma.c  | 4205 +
> >  migration/rdma.h  |   67 +-
> >  migration/savevm.c|2 +-
> >  migration/trace-events|   68 +-
> >  qapi/migration.json   |   13 +-
> >  scripts/analyze-migration.py  |3 -
> >  tests/unit/meson.build|1 +
> >  tests/unit/test-io-channel-rdma.c |  276 ++
> >  24 files changed, 1360 insertions(+), 4832 deletions(-)  delete mode
> > 100644 docs/rdma.txt  create mode 100644 include/io/channel-rdma.h
> > create mode 100644 io/channel-rdma.c  create mode 100644
> > tests/unit/test-io-channel-rdma.c
> >
> > --
> > 2.43.0
> >



RE: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-06-05 Thread Gonglei (Arei)
Hi Peter,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, June 5, 2024 3:32 AM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; yu.zh...@ionos.com; mgal...@akamai.com;
> elmar.ger...@ionos.com; zhengchuan ;
> berra...@redhat.com; arm...@redhat.com; lizhij...@fujitsu.com;
> pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> ; Fabiano Rosas 
> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
> 
> Hi, Lei, Jialin,
> 
> Thanks a lot for working on this!
> 
> I think we'll need to wait a bit on feedbacks from Jinpu and his team on RDMA
> side, also Daniel for iochannels.  Also, please remember to copy Fabiano
> Rosas in any relevant future posts.  We'd also like to know whether he has any
> comments too.  I have him copied in this reply.
> 
> On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote:
> > From: Jialin Wang 
> >
> > Hi,
> >
> > This patch series attempts to refactor RDMA live migration by
> > introducing a new QIOChannelRDMA class based on the rsocket API.
> >
> > The /usr/include/rdma/rsocket.h provides a higher level rsocket API
> > that is a 1-1 match of the normal kernel 'sockets' API, which hides
> > the detail of rdma protocol into rsocket and allows us to add support
> > for some modern features like multifd more easily.
> >
> > Here is the previous discussion on refactoring RDMA live migration
> > using the rsocket API:
> >
> > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar
> > o.org/
> >
> > We have encountered some bugs when using rsocket and plan to submit
> > them to the rdma-core community.
> >
> > In addition, the use of rsocket makes our programming more convenient,
> > but it must be noted that this method introduces multiple memory
> > copies, which can be imagined that there will be a certain performance
> > degradation, hoping that friends with RDMA network cards can help verify,
> thank you!
> 
> It'll be good to elaborate if you tested it in-house. What people should 
> expect
> on the numbers exactly?  Is that okay from Huawei's POV?
> 
> Besides that, the code looks pretty good at a first glance to me.  Before
> others chim in, here're some high level comments..
> 
> Firstly, can we avoid using coroutine when listen()?  Might be relevant when I
> see that rdma_accept_incoming_migration() runs in a loop to do raccept(), but
> would that also hang the qemu main loop even with the coroutine, before all
> channels are ready?  I'm not a coroutine person, but I think the hope is that
> we can make dest QEMU run in a thread in the future just like the src QEMU, so
> the less coroutine the better in this path.
> 

Because rsocket is set to non-blocking, raccept will return EAGAIN when no 
connection 
is received, coroutine will yield, and will not hang the qemu main loop.

> I think I also left a comment elsewhere on whether it would be possible to 
> allow
> iochannels implement their own poll() functions to avoid the per-channel poll
> thread that is proposed in this series.
> 
> https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n
> 

We noticed that, and it's a big operation. I'm not sure that's a better way.

> Personally I think even with the thread proposal it's better than the old rdma
> code, but I just still want to double check with you guys.  E.g., maybe that 
> just
> won't work at all?  Again, that'll also be based on the fact that we move
> migration incoming into a thread first to keep the dest QEMU main loop intact,
> I think, but I hope we will reach that irrelevant of rdma, IOW it'll be nice 
> to
> happen even earlier if possible.
> 
Yep. This is a fairly big change, I wonder what other people's suggestions are?

> Another nitpick is that qio_channel_rdma_listen_async() doesn't look used and
> may prone to removal.
> 

Yes. This is because when we wrote the test case, we wanted to test 
qio_channel_rdma_connect_async, 
and also I added qio_channel_rdma_listen_async. It is not used in the RDMA hot 
migration code.

Regards,
-Gonglei



RE: [PATCH 1/6] migration: remove RDMA live migration temporarily

2024-06-05 Thread Gonglei (Arei)


> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: Tuesday, June 4, 2024 10:02 PM
> To: Gonglei (Arei) ; qemu-devel@nongnu.org
> Cc: pet...@redhat.com; yu.zh...@ionos.com; mgal...@akamai.com;
> elmar.ger...@ionos.com; zhengchuan ;
> berra...@redhat.com; arm...@redhat.com; lizhij...@fujitsu.com;
> pbonz...@redhat.com; m...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> 
> Subject: Re: [PATCH 1/6] migration: remove RDMA live migration temporarily
> 
> On 04.06.24 14:14, Gonglei via wrote:
> > From: Jialin Wang 
> >
> > The new RDMA live migration will be introduced in the upcoming few
> > commits.
> >
> > Signed-off-by: Jialin Wang 
> > Signed-off-by: Gonglei 
> > ---
> 
> [...]
> 
> > -
> > -/* Avoid ram_block_discard_disable(), cannot change during migration.
> */
> > -if (ram_block_discard_is_required()) {
> > -error_setg(errp, "RDMA: cannot disable RAM discard");
> > -return;
> > -}
> 
> I'm particularly interested in the interaction with virtio-balloon/virtio-mem.
> 
> Do we still have to disable discarding of RAM, and where would you do that in
> the rewrite?
> 

Yes, we do. We didn't change the logic. Thanks for your catching.

Regards,
-Gonglei

> --
> Cheers,
> 
> David / dhildenb



RE: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-06-05 Thread Gonglei (Arei)



> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, June 5, 2024 3:57 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; pet...@redhat.com; yu.zh...@ionos.com;
> mgal...@akamai.com; elmar.ger...@ionos.com; zhengchuan
> ; berra...@redhat.com; arm...@redhat.com;
> lizhij...@fujitsu.com; pbonz...@redhat.com; Xiexiangyou
> ; linux-r...@vger.kernel.org; lixiao (H)
> ; jinpu.w...@ionos.com; Wangjialin
> 
> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API
> 
> On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote:
> > From: Jialin Wang 
> >
> > Hi,
> >
> > This patch series attempts to refactor RDMA live migration by
> > introducing a new QIOChannelRDMA class based on the rsocket API.
> >
> > The /usr/include/rdma/rsocket.h provides a higher level rsocket API
> > that is a 1-1 match of the normal kernel 'sockets' API, which hides
> > the detail of rdma protocol into rsocket and allows us to add support
> > for some modern features like multifd more easily.
> >
> > Here is the previous discussion on refactoring RDMA live migration
> > using the rsocket API:
> >
> > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar
> > o.org/
> >
> > We have encountered some bugs when using rsocket and plan to submit
> > them to the rdma-core community.
> >
> > In addition, the use of rsocket makes our programming more convenient,
> > but it must be noted that this method introduces multiple memory
> > copies, which can be imagined that there will be a certain performance
> > degradation, hoping that friends with RDMA network cards can help verify,
> thank you!
> 
> So you didn't test it with an RDMA card?

Yep, we tested it by Soft-ROCE.

> You really should test with an RDMA card though, for correctness as much as
> performance.
> 
We will, we just don't have RDMA cards environment on hand at the moment.

Regards,
-Gonglei

> 
> > Jialin Wang (6):
> >   migration: remove RDMA live migration temporarily
> >   io: add QIOChannelRDMA class
> >   io/channel-rdma: support working in coroutine
> >   tests/unit: add test-io-channel-rdma.c
> >   migration: introduce new RDMA live migration
> >   migration/rdma: support multifd for RDMA migration
> >
> >  docs/rdma.txt |  420 ---
> >  include/io/channel-rdma.h |  165 ++
> >  io/channel-rdma.c |  798 ++
> >  io/meson.build|1 +
> >  io/trace-events   |   14 +
> >  meson.build   |6 -
> >  migration/meson.build |3 +-
> >  migration/migration-stats.c   |5 +-
> >  migration/migration-stats.h   |4 -
> >  migration/migration.c |   13 +-
> >  migration/migration.h |9 -
> >  migration/multifd.c   |   10 +
> >  migration/options.c   |   16 -
> >  migration/options.h   |2 -
> >  migration/qemu-file.c |1 -
> >  migration/ram.c   |   90 +-
> >  migration/rdma.c  | 4205 +
> >  migration/rdma.h  |   67 +-
> >  migration/savevm.c|2 +-
> >  migration/trace-events|   68 +-
> >  qapi/migration.json   |   13 +-
> >  scripts/analyze-migration.py  |3 -
> >  tests/unit/meson.build|1 +
> >  tests/unit/test-io-channel-rdma.c |  276 ++
> >  24 files changed, 1360 insertions(+), 4832 deletions(-)  delete mode
> > 100644 docs/rdma.txt  create mode 100644 include/io/channel-rdma.h
> > create mode 100644 io/channel-rdma.c  create mode 100644
> > tests/unit/test-io-channel-rdma.c
> >
> > --
> > 2.43.0




RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-29 Thread Gonglei (Arei)
Hi,

> -Original Message-
> > >
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda
> > > > > > 15
> > > > > > > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > > > > > > >
> > > > > > > > Appreciate a lot for everyone helping on the testings.
> > > > > > > >
> > > > > > > > > InfiniBand controller: Mellanox Technologies MT27800
> > > > > > > > > Family [ConnectX-5]
> > > > > > > > >
> > > > > > > > > which doesn't meet our purpose. I can choose RDMA or TCP
> > > > > > > > > for VM migration. RDMA traffic is through InfiniBand and
> > > > > > > > > TCP through Ethernet on these two hosts. One is standby
> > > > > > > > > while the other
> > > is active.
> > > > > > > > >
> > > > > > > > > Now I'll try on a server with more recent Ethernet and
> > > > > > > > > InfiniBand network adapters. One of them has:
> > > > > > > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller
> > > > > > > > > (rev
> > > > > > > > > 01)
> > > > > > > > >
> > > > > > > > > The comparison between RDMA and TCP on the same NIC
> > > > > > > > > could make more
> > > > > > > > sense.
> > > > > > > >
> > > > > > > > It looks to me NICs are powerful now, but again as I
> > > > > > > > mentioned I don't think it's a reason we need to deprecate
> > > > > > > > rdma, especially if QEMU's rdma migration has the chance
> > > > > > > > to be refactored
> > > using rsocket.
> > > > > > > >
> > > > > > > > Is there anyone who started looking into that direction?
> > > > > > > > Would it make sense we start some PoC now?
> > > > > > > >
> > > > > > >
> > > > > > > My team has finished the PoC refactoring which works well.
> > > > > > >
> > > > > > > Progress:
> > > > > > > 1.  Implement io/channel-rdma.c, 2.  Add unit test
> > > > > > > tests/unit/test-io-channel-rdma.c and verifying it is
> > > > > > > successful, 3.  Remove the original code from migration/rdma.c, 4.
> > > > > > > Rewrite the rdma_start_outgoing_migration and
> > > > > > > rdma_start_incoming_migration logic, 5.  Remove all rdma_xxx
> > > > > > > functions from migration/ram.c. (to prevent RDMA live
> > > > > > > migration from polluting the
> > > > > > core logic of live migration), 6.  The soft-RoCE implemented
> > > > > > by software is used to test the RDMA live migration. It's 
> > > > > > successful.
> > > > > > >
> > > > > > > We will be submit the patchset later.
> > > > > >
> > > > > > That's great news, thank you!
> > > > > >
> > > > > > --
> > > > > > Peter Xu
> > > > >
> > > > > For rdma programming, the current mainstream implementation is
> > > > > to use
> > > rdma_cm to establish a connection, and then use verbs to transmit data.
> > > > >
> > > > > rdma_cm and ibverbs create two FDs respectively. The two FDs
> > > > > have different responsibilities. rdma_cm fd is used to notify
> > > > > connection establishment events, and verbs fd is used to notify
> > > > > new CQEs. When
> > > poll/epoll monitoring is directly performed on the rdma_cm fd, only
> > > a pollin event can be monitored, which means that an rdma_cm event
> > > occurs. When the verbs fd is directly polled/epolled, only the
> > > pollin event can be listened, which indicates that a new CQE is generated.
> > > > >
> > > > > Rsocket is a sub-module attached to the rdma_cm library and
> > > > > provides rdma calls that are completely similar to socket interfaces.
> > > > > However, this library returns only the rdma_cm fd for listening
> > > > > to link
> > > setup-related events and does not expose the verbs fd (readable and
> > > writable events for listening to data). Only the rpoll interface
> > > provided by the RSocket can be used to listen to related events.
> > > However, QEMU uses the ppoll interface to listen to the rdma_cm fd
> (gotten by raccept API).
> > > > > And cannot listen to the verbs fd event.
> I'm confused, the rs_poll_arm
> :https://github.com/linux-rdma/rdma-core/blob/master/librdmacm/rsocket.c#
> L3290
> For STREAM, rpoll setup fd for both cq fd and cm fd.
> 

Right. But the question is QEMU do not use rpoll but gilb's ppoll. :(


Regards,
-Gonglei



RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-29 Thread Gonglei (Arei)


> -Original Message-
> From: Jinpu Wang [mailto:jinpu.w...@ionos.com]
> Sent: Wednesday, May 29, 2024 5:18 PM
> To: Gonglei (Arei) 
> Cc: Greg Sword ; Peter Xu ;
> Yu Zhang ; Michael Galaxy ;
> Elmar Gerdes ; zhengchuan
> ; Daniel P. Berrangé ;
> Markus Armbruster ; Zhijian Li (Fujitsu)
> ; qemu-devel@nongnu.org; Yuval Shaia
> ; Kevin Wolf ; Prasanna
> Kumar Kalever ; Cornelia Huck
> ; Michael Roth ; Prasanna
> Kumar Kalever ; Paolo Bonzini
> ; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
> Hanna Reitz ; Michael S. Tsirkin ;
> Thomas Huth ; Eric Blake ; Song
> Gao ; Marc-André Lureau
> ; Alex Bennée ;
> Wainer dos Santos Moschetta ; Beraldo Leal
> ; Pannengyuan ;
> Xiexiangyou ; Fabiano Rosas ;
> RDMA mailing list ; she...@nvidia.com; Haris
> Iqbal 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> Hi Gonglei,
> 
> On Wed, May 29, 2024 at 10:31 AM Gonglei (Arei) 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Greg Sword [mailto:gregswo...@gmail.com]
> > > Sent: Wednesday, May 29, 2024 2:06 PM
> > > To: Jinpu Wang 
> > > Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol
> > > handling
> > >
> > > On Wed, May 29, 2024 at 12:33 PM Jinpu Wang 
> > > wrote:
> > > >
> > > > On Wed, May 29, 2024 at 4:43 AM Gonglei (Arei)
> > > > 
> > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Peter Xu [mailto:pet...@redhat.com]
> > > > > > Sent: Tuesday, May 28, 2024 11:55 PM
> > > > > > > > > Exactly, not so compelling, as I did it first only on
> > > > > > > > > servers widely used for production in our data center.
> > > > > > > > > The network adapters are
> > > > > > > > >
> > > > > > > > > Ethernet controller: Broadcom Inc. and subsidiaries
> > > > > > > > > NetXtreme
> > > > > > > > > BCM5720 2-port Gigabit Ethernet PCIe
> > > > > > > >
> > > > > > > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6
> > > > > > > > looks more
> > > > > > reasonable.
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda
> > > > > > 15
> > > > > > > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > > > > > > >
> > > > > > > > Appreciate a lot for everyone helping on the testings.
> > > > > > > >
> > > > > > > > > InfiniBand controller: Mellanox Technologies MT27800
> > > > > > > > > Family [ConnectX-5]
> > > > > > > > >
> > > > > > > > > which doesn't meet our purpose. I can choose RDMA or TCP
> > > > > > > > > for VM migration. RDMA traffic is through InfiniBand and
> > > > > > > > > TCP through Ethernet on these two hosts. One is standby
> > > > > > > > > while the other
> > > is active.
> > > > > > > > >
> > > > > > > > > Now I'll try on a server with more recent Ethernet and
> > > > > > > > > InfiniBand network adapters. One of them has:
> > > > > > > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller
> > > > > > > > > (rev
> > > > > > > > > 01)
> > > > > > > > >
> > > > > > > > > The comparison between RDMA and TCP on the same NIC
> > > > > > > > > could make more
> > > > > > > > sense.
> > > > > > > >
> > > > > > > > It looks to me NICs are powerful now, but again as I
> > > > > > > > mentioned I don't think it's a reason we need to deprecate
> > > > > > > > rdma, especially if QEMU's rdma migration has the chance
> > > > > > > > to be refactored
> > > using rsocket.
> > > > > > > >
> > > > > > > > Is there anyone who started looking into that direction?
> > > > > > > > Would it make

RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-29 Thread Gonglei (Arei)


> -Original Message-
> From: Greg Sword [mailto:gregswo...@gmail.com]
> Sent: Wednesday, May 29, 2024 2:06 PM
> To: Jinpu Wang 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Wed, May 29, 2024 at 12:33 PM Jinpu Wang 
> wrote:
> >
> > On Wed, May 29, 2024 at 4:43 AM Gonglei (Arei) 
> wrote:
> > >
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Peter Xu [mailto:pet...@redhat.com]
> > > > Sent: Tuesday, May 28, 2024 11:55 PM
> > > > > > > Exactly, not so compelling, as I did it first only on
> > > > > > > servers widely used for production in our data center. The
> > > > > > > network adapters are
> > > > > > >
> > > > > > > Ethernet controller: Broadcom Inc. and subsidiaries
> > > > > > > NetXtreme
> > > > > > > BCM5720 2-port Gigabit Ethernet PCIe
> > > > > >
> > > > > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks
> > > > > > more
> > > > reasonable.
> > > > > >
> > > > > >
> > > >
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda
> > > > 15
> > > > > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > > > > >
> > > > > > Appreciate a lot for everyone helping on the testings.
> > > > > >
> > > > > > > InfiniBand controller: Mellanox Technologies MT27800 Family
> > > > > > > [ConnectX-5]
> > > > > > >
> > > > > > > which doesn't meet our purpose. I can choose RDMA or TCP for
> > > > > > > VM migration. RDMA traffic is through InfiniBand and TCP
> > > > > > > through Ethernet on these two hosts. One is standby while the 
> > > > > > > other
> is active.
> > > > > > >
> > > > > > > Now I'll try on a server with more recent Ethernet and
> > > > > > > InfiniBand network adapters. One of them has:
> > > > > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev
> > > > > > > 01)
> > > > > > >
> > > > > > > The comparison between RDMA and TCP on the same NIC could
> > > > > > > make more
> > > > > > sense.
> > > > > >
> > > > > > It looks to me NICs are powerful now, but again as I mentioned
> > > > > > I don't think it's a reason we need to deprecate rdma,
> > > > > > especially if QEMU's rdma migration has the chance to be refactored
> using rsocket.
> > > > > >
> > > > > > Is there anyone who started looking into that direction?
> > > > > > Would it make sense we start some PoC now?
> > > > > >
> > > > >
> > > > > My team has finished the PoC refactoring which works well.
> > > > >
> > > > > Progress:
> > > > > 1.  Implement io/channel-rdma.c, 2.  Add unit test
> > > > > tests/unit/test-io-channel-rdma.c and verifying it is
> > > > > successful, 3.  Remove the original code from migration/rdma.c, 4.
> > > > > Rewrite the rdma_start_outgoing_migration and
> > > > > rdma_start_incoming_migration logic, 5.  Remove all rdma_xxx
> > > > > functions from migration/ram.c. (to prevent RDMA live migration
> > > > > from polluting the
> > > > core logic of live migration), 6.  The soft-RoCE implemented by
> > > > software is used to test the RDMA live migration. It's successful.
> > > > >
> > > > > We will be submit the patchset later.
> > > >
> > > > That's great news, thank you!
> > > >
> > > > --
> > > > Peter Xu
> > >
> > > For rdma programming, the current mainstream implementation is to use
> rdma_cm to establish a connection, and then use verbs to transmit data.
> > >
> > > rdma_cm and ibverbs create two FDs respectively. The two FDs have
> > > different responsibilities. rdma_cm fd is used to notify connection
> > > establishment events, and verbs fd is used to notify new CQEs. When
> poll/epoll monitoring is directly performed on the rdma_cm fd, only a pollin
> event can be monitored, which means that an rdma_cm event occurs. When
> the verbs fd is directly polled/epolled, only the pollin event can be 
> listened,
> which indicates that a new CQE is generated.
> > >
> > > Rsocket is a sub-module attached to the rdma_cm library and provides
> > > rdma calls that are completely similar to socket interfaces.
> > > However, this library returns only the rdma_cm fd for listening to link
> setup-related events and does not expose the verbs fd (readable and writable
> events for listening to data). Only the rpoll interface provided by the 
> RSocket
> can be used to listen to related events. However, QEMU uses the ppoll
> interface to listen to the rdma_cm fd (gotten by raccept API).
> > > And cannot listen to the verbs fd event. Only some hacking methods can be
> used to address this problem.
> > >
> > > Do you guys have any ideas? Thanks.
> > +cc linux-rdma
> 
> Why include rdma community?
> 

Can rdma/rsocket provide an API to expose the verbs fd? 


Regards,
-Gonglei

> > +cc Sean
> >
> >
> >
> > >
> > >
> > > Regards,
> > > -Gonglei
> >


RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-28 Thread Gonglei (Arei)
Hi,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Tuesday, May 28, 2024 11:55 PM
> > > > Exactly, not so compelling, as I did it first only on servers
> > > > widely used for production in our data center. The network
> > > > adapters are
> > > >
> > > > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > > > BCM5720 2-port Gigabit Ethernet PCIe
> > >
> > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more
> reasonable.
> > >
> > >
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > >
> > > Appreciate a lot for everyone helping on the testings.
> > >
> > > > InfiniBand controller: Mellanox Technologies MT27800 Family
> > > > [ConnectX-5]
> > > >
> > > > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > > > migration. RDMA traffic is through InfiniBand and TCP through
> > > > Ethernet on these two hosts. One is standby while the other is active.
> > > >
> > > > Now I'll try on a server with more recent Ethernet and InfiniBand
> > > > network adapters. One of them has:
> > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> > > >
> > > > The comparison between RDMA and TCP on the same NIC could make
> > > > more
> > > sense.
> > >
> > > It looks to me NICs are powerful now, but again as I mentioned I
> > > don't think it's a reason we need to deprecate rdma, especially if
> > > QEMU's rdma migration has the chance to be refactored using rsocket.
> > >
> > > Is there anyone who started looking into that direction?  Would it
> > > make sense we start some PoC now?
> > >
> >
> > My team has finished the PoC refactoring which works well.
> >
> > Progress:
> > 1.  Implement io/channel-rdma.c,
> > 2.  Add unit test tests/unit/test-io-channel-rdma.c and verifying it
> > is successful, 3.  Remove the original code from migration/rdma.c, 4.
> > Rewrite the rdma_start_outgoing_migration and
> > rdma_start_incoming_migration logic, 5.  Remove all rdma_xxx functions
> > from migration/ram.c. (to prevent RDMA live migration from polluting the
> core logic of live migration), 6.  The soft-RoCE implemented by software is
> used to test the RDMA live migration. It's successful.
> >
> > We will be submit the patchset later.
> 
> That's great news, thank you!
> 
> --
> Peter Xu

For rdma programming, the current mainstream implementation is to use rdma_cm 
to establish a connection, and then use verbs to transmit data.

rdma_cm and ibverbs create two FDs respectively. The two FDs have different 
responsibilities. rdma_cm fd is used to notify connection establishment events, 
and verbs fd is used to notify new CQEs. When poll/epoll monitoring is directly 
performed on the rdma_cm fd, only a pollin event can be monitored, which means
that an rdma_cm event occurs. When the verbs fd is directly polled/epolled, 
only the pollin event can be listened, which indicates that a new CQE is 
generated.

Rsocket is a sub-module attached to the rdma_cm library and provides rdma calls 
that are completely similar to socket interfaces. However, this library returns 
only the rdma_cm fd for listening to link setup-related events and does not 
expose the verbs fd (readable and writable events for listening to data). Only 
the rpoll 
interface provided by the RSocket can be used to listen to related events. 
However, QEMU uses the ppoll interface to listen to the rdma_cm fd (gotten by 
raccept API). 
And cannot listen to the verbs fd event. Only some hacking methods can be used 
to address this problem. 

Do you guys have any ideas? Thanks.


Regards,
-Gonglei


RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-28 Thread Gonglei (Arei)
Hi Peter,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, May 22, 2024 6:15 AM
> To: Yu Zhang 
> Cc: Michael Galaxy ; Jinpu Wang
> ; Elmar Gerdes ;
> zhengchuan ; Gonglei (Arei)
> ; Daniel P. Berrangé ;
> Markus Armbruster ; Zhijian Li (Fujitsu)
> ; qemu-devel@nongnu.org; Yuval Shaia
> ; Kevin Wolf ; Prasanna
> Kumar Kalever ; Cornelia Huck
> ; Michael Roth ; Prasanna
> Kumar Kalever ; Paolo Bonzini
> ; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
> Hanna Reitz ; Michael S. Tsirkin ;
> Thomas Huth ; Eric Blake ; Song
> Gao ; Marc-André Lureau
> ; Alex Bennée ;
> Wainer dos Santos Moschetta ; Beraldo Leal
> ; Pannengyuan ;
> Xiexiangyou ; Fabiano Rosas 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Fri, May 17, 2024 at 03:01:59PM +0200, Yu Zhang wrote:
> > Hello Michael and Peter,
> 
> Hi,
> 
> >
> > Exactly, not so compelling, as I did it first only on servers widely
> > used for production in our data center. The network adapters are
> >
> > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720
> > 2-port Gigabit Ethernet PCIe
> 
> Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more reasonable.
> 
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> wvaqk81vxtkzx-l...@mail.gmail.com/
> 
> Appreciate a lot for everyone helping on the testings.
> 
> > InfiniBand controller: Mellanox Technologies MT27800 Family
> > [ConnectX-5]
> >
> > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > migration. RDMA traffic is through InfiniBand and TCP through Ethernet
> > on these two hosts. One is standby while the other is active.
> >
> > Now I'll try on a server with more recent Ethernet and InfiniBand
> > network adapters. One of them has:
> > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> >
> > The comparison between RDMA and TCP on the same NIC could make more
> sense.
> 
> It looks to me NICs are powerful now, but again as I mentioned I don't think 
> it's
> a reason we need to deprecate rdma, especially if QEMU's rdma migration has
> the chance to be refactored using rsocket.
> 
> Is there anyone who started looking into that direction?  Would it make sense
> we start some PoC now?
> 

My team has finished the PoC refactoring which works well. 

Progress:
1.  Implement io/channel-rdma.c,
2.  Add unit test tests/unit/test-io-channel-rdma.c and verifying it is 
successful,
3.  Remove the original code from migration/rdma.c,
4.  Rewrite the rdma_start_outgoing_migration and rdma_start_incoming_migration 
logic,
5.  Remove all rdma_xxx functions from migration/ram.c. (to prevent RDMA live 
migration from polluting the core logic of live migration),
6.  The soft-RoCE implemented by software is used to test the RDMA live 
migration. It's successful.

We will be submit the patchset later.


Regards,
-Gonglei

> Thanks,
> 
> --
> Peter Xu



RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-06 Thread Gonglei (Arei)
Hello,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Monday, May 6, 2024 11:18 PM
> To: Gonglei (Arei) 
> Cc: Daniel P. Berrangé ; Markus Armbruster
> ; Michael Galaxy ; Yu Zhang
> ; Zhijian Li (Fujitsu) ; Jinpu Wang
> ; Elmar Gerdes ;
> qemu-devel@nongnu.org; Yuval Shaia ; Kevin Wolf
> ; Prasanna Kumar Kalever
> ; Cornelia Huck ;
> Michael Roth ; Prasanna Kumar Kalever
> ; integrat...@gluster.org; Paolo Bonzini
> ; qemu-bl...@nongnu.org; de...@lists.libvirt.org;
> Hanna Reitz ; Michael S. Tsirkin ;
> Thomas Huth ; Eric Blake ; Song
> Gao ; Marc-André Lureau
> ; Alex Bennée ;
> Wainer dos Santos Moschetta ; Beraldo Leal
> ; Pannengyuan ;
> Xiexiangyou 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Mon, May 06, 2024 at 02:06:28AM +, Gonglei (Arei) wrote:
> > Hi, Peter
> 
> Hey, Lei,
> 
> Happy to see you around again after years.
> 
Haha, me too.

> > RDMA features high bandwidth, low latency (in non-blocking lossless
> > network), and direct remote memory access by bypassing the CPU (As you
> > know, CPU resources are expensive for cloud vendors, which is one of
> > the reasons why we introduced offload cards.), which TCP does not have.
> 
> It's another cost to use offload cards, v.s. preparing more cpu resources?
> 
Software and hardware offload converged architecture is the way to go for all 
cloud vendors 
(Including comprehensive benefits in terms of performance, cost, security, and 
innovation speed), 
it's not just a matter of adding the resource of a DPU card.

> > In some scenarios where fast live migration is needed (extremely short
> > interruption duration and migration duration) is very useful. To this
> > end, we have also developed RDMA support for multifd.
> 
> Will any of you upstream that work?  I'm curious how intrusive would it be
> when adding it to multifd, if it can keep only 5 exported functions like what
> rdma.h does right now it'll be pretty nice.  We also want to make sure it 
> works
> with arbitrary sized loads and buffers, e.g. vfio is considering to add IO 
> loads to
> multifd channels too.
> 

In fact, we sent the patchset to the community in 2021. Pls see:
https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/


> One thing to note that the question here is not about a pure performance
> comparison between rdma and nics only.  It's about help us make a decision
> on whether to drop rdma, iow, even if rdma performs well, the community still
> has the right to drop it if nobody can actively work and maintain it.
> It's just that if nics can perform as good it's more a reason to drop, unless
> companies can help to provide good support and work together.
> 

We are happy to provide the necessary review and maintenance work for RDMA
if the community needs it.

CC'ing Chuan Zheng.


Regards,
-Gonglei



RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-06 Thread Gonglei (Arei)
Hi, Peter

RDMA features high bandwidth, low latency (in non-blocking lossless network), 
and direct remote 
memory access by bypassing the CPU (As you know, CPU resources are expensive 
for cloud vendors, 
which is one of the reasons why we introduced offload cards.), which TCP does 
not have. 

In some scenarios where fast live migration is needed (extremely short 
interruption duration and migration 
duration) is very useful. To this end, we have also developed RDMA support for 
multifd.

Regards,
-Gonglei

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, May 1, 2024 11:31 PM
> To: Daniel P. Berrangé 
> Cc: Markus Armbruster ; Michael Galaxy
> ; Yu Zhang ; Zhijian Li (Fujitsu)
> ; Jinpu Wang ; Elmar Gerdes
> ; qemu-devel@nongnu.org; Yuval Shaia
> ; Kevin Wolf ; Prasanna
> Kumar Kalever ; Cornelia Huck
> ; Michael Roth ; Prasanna
> Kumar Kalever ; integrat...@gluster.org; Paolo
> Bonzini ; qemu-bl...@nongnu.org;
> de...@lists.libvirt.org; Hanna Reitz ; Michael S. Tsirkin
> ; Thomas Huth ; Eric Blake
> ; Song Gao ; Marc-André
> Lureau ; Alex Bennée
> ; Wainer dos Santos Moschetta
> ; Beraldo Leal ; Gonglei (Arei)
> ; Pannengyuan 
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Tue, Apr 30, 2024 at 09:00:49AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Apr 30, 2024 at 09:15:03AM +0200, Markus Armbruster wrote:
> > > Peter Xu  writes:
> > >
> > > > On Mon, Apr 29, 2024 at 08:08:10AM -0500, Michael Galaxy wrote:
> > > >> Hi All (and Peter),
> > > >
> > > > Hi, Michael,
> > > >
> > > >>
> > > >> My name is Michael Galaxy (formerly Hines). Yes, I changed my
> > > >> last name (highly irregular for a male) and yes, that's my real last 
> > > >> name:
> > > >> https://www.linkedin.com/in/mrgalaxy/)
> > > >>
> > > >> I'm the original author of the RDMA implementation. I've been
> > > >> discussing with Yu Zhang for a little bit about potentially
> > > >> handing over maintainership of the codebase to his team.
> > > >>
> > > >> I simply have zero access to RoCE or Infiniband hardware at all,
> > > >> unfortunately. so I've never been able to run tests or use what I
> > > >> wrote at work, and as all of you know, if you don't have a way to
> > > >> test something, then you can't maintain it.
> > > >>
> > > >> Yu Zhang put a (very kind) proposal forward to me to ask the
> > > >> community if they feel comfortable training his team to maintain
> > > >> the codebase (and run
> > > >> tests) while they learn about it.
> > > >
> > > > The "while learning" part is fine at least to me.  IMHO the
> > > > "ownership" to the code, or say, taking over the responsibility,
> > > > may or may not need 100% mastering the code base first.  There
> > > > should still be some fundamental confidence to work on the code
> > > > though as a starting point, then it's about serious use case to
> > > > back this up, and careful testings while getting more familiar with it.
> > >
> > > How much experience we expect of maintainers depends on the
> > > subsystem and other circumstances.  The hard requirement isn't
> > > experience, it's trust.  See the recent attack on xz.
> > >
> > > I do not mean to express any doubts whatsoever on Yu Zhang's integrity!
> > > I'm merely reminding y'all what's at stake.
> >
> > I think we shouldn't overly obsess[1] about 'xz', because the
> > overwhealmingly common scenario is that volunteer maintainers are
> > honest people. QEMU is in a massively better peer review situation.
> > With xz there was basically no oversight of the new maintainer. With
> > QEMU, we have oversight from 1000's of people on the list, a huge pool
> > of general maintainers, the specific migration maintainers, and the release
> manager merging code.
> >
> > With a lack of historical experiance with QEMU maintainership, I'd
> > suggest that new RDMA volunteers would start by adding themselves to the
> "MAINTAINERS"
> > file with only the 'Reviewer' classification. The main migration
> > maintainers would still handle pull requests, but wait for a R-b from
> > one of the RMDA volunteers. After some period of time the RDMA folks
> > could graduate to full maintainer status if the migration maintainers needed
> to reduce their load.
> > I suspect

RE: [PATCH-for-8.2 v2] backends/cryptodev: Do not ignore throttle/backends Errors

2023-11-20 Thread Gonglei (Arei)


> -Original Message-
> From: Philippe Mathieu-Daudé [mailto:phi...@linaro.org]
> Sent: Monday, November 20, 2023 11:04 PM
> To: qemu-devel@nongnu.org
> Cc: Zhenwei Pi ; Gonglei (Arei)
> ; Markus Armbruster ;
> Daniel P . Berrangé ; Philippe Mathieu-Daudé
> ; qemu-sta...@nongnu.org
> Subject: [PATCH-for-8.2 v2] backends/cryptodev: Do not ignore
> throttle/backends Errors
> 
> Both cryptodev_backend_set_throttle() and CryptoDevBackendClass::init() can
> set their Error** argument. Do not ignore them, return early on failure. Use
> the ERRP_GUARD() macro as suggested in commit ae7c80a7bd
> ("error: New macro ERRP_GUARD()").
> 
> Cc: qemu-sta...@nongnu.org
> Fixes: e7a775fd9f ("cryptodev: Account statistics")
> Fixes: 2580b452ff ("cryptodev: support QoS")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---

Reviewed-by: Gonglei 


Regards,
-Gonglei



RE: [PATCH v2] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request

2023-05-09 Thread Gonglei (Arei)



> -Original Message-
> From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com]
> Sent: Tuesday, May 9, 2023 3:53 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; Gonglei (Arei) ;
> pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com
> Subject: [PATCH v2] virtio-crypto: fix NULL pointer dereference in
> virtio_crypto_free_request
> 
> Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM
> algtype.
> 
> Fixes: 0e660a6f90a ("crypto: Introduce RSA algorithm")
> Signed-off-by: Mauro Matteo Cascella 
> Reported-by: Yiming Tao 
> ---
> v2:
> - updated 'Fixes:' tag
> 
>  hw/virtio/virtio-crypto.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 

Reviewed-by: Gonglei 


Regards,
-Gonglei

> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c index
> 2fe804510f..c729a1f79e 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -476,15 +476,17 @@ static void
> virtio_crypto_free_request(VirtIOCryptoReq *req)
>  size_t max_len;
>  CryptoDevBackendSymOpInfo *op_info =
> req->op_info.u.sym_op_info;
> 
> -max_len = op_info->iv_len +
> -  op_info->aad_len +
> -  op_info->src_len +
> -  op_info->dst_len +
> -  op_info->digest_result_len;
> -
> -/* Zeroize and free request data structure */
> -memset(op_info, 0, sizeof(*op_info) + max_len);
> -g_free(op_info);
> +if (op_info) {
> +max_len = op_info->iv_len +
> +  op_info->aad_len +
> +  op_info->src_len +
> +  op_info->dst_len +
> +  op_info->digest_result_len;
> +
> +/* Zeroize and free request data structure */
> +memset(op_info, 0, sizeof(*op_info) + max_len);
> +g_free(op_info);
> +}
>  } else if (req->flags == QCRYPTODEV_BACKEND_ALG_ASYM) {
>  CryptoDevBackendAsymOpInfo *op_info =
> req->op_info.u.asym_op_info;
>  if (op_info) {
> --
> 2.40.1




RE: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request

2023-05-08 Thread Gonglei (Arei)



> -Original Message-
> From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com]
> Sent: Monday, May 8, 2023 11:02 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; Gonglei (Arei) ;
> pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com
> Subject: [PATCH] virtio-crypto: fix NULL pointer dereference in
> virtio_crypto_free_request
> 
> Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM
> algtype.
> 
> Fixes: 02ed3e7c ("virtio-crypto: zeroize the key material before free")

I have to say the fixes is incorrect. The bug was introduced by commit 
0e660a6f90a, which
changed the semantic meaning of request-> flag.

Regards,
-Gonglei




RE: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm

2022-05-31 Thread Gonglei (Arei)


> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Tuesday, May 31, 2022 9:48 AM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; m...@redhat.com;
> virtualizat...@lists.linux-foundation.org; helei.si...@bytedance.com;
> berra...@redhat.com
> Subject: Re: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm
> 
> On 5/30/22 21:31, Gonglei (Arei) wrote:
> >
> >
> >> -Original Message-
> >> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> >> Sent: Friday, May 27, 2022 4:48 PM
> >> To: m...@redhat.com; Gonglei (Arei) 
> >> Cc: qemu-devel@nongnu.org; virtualizat...@lists.linux-foundation.org;
> >> helei.si...@bytedance.com; berra...@redhat.com; zhenwei pi
> >> 
> >> Subject: [PATCH v8 1/1] crypto: Introduce RSA algorithm
> >>
> >>
> > Skip...
> >
> >> +static int64_t
> >> +virtio_crypto_create_asym_session(VirtIOCrypto *vcrypto,
> >> +   struct virtio_crypto_akcipher_create_session_req
> >> *sess_req,
> >> +   uint32_t queue_id, uint32_t opcode,
> >> +   struct iovec *iov, unsigned int out_num) {
> >> +VirtIODevice *vdev = VIRTIO_DEVICE(vcrypto);
> >> +CryptoDevBackendSessionInfo info = {0};
> >> +CryptoDevBackendAsymSessionInfo *asym_info;
> >> +int64_t session_id;
> >> +int queue_index;
> >> +uint32_t algo, keytype, keylen;
> >> +g_autofree uint8_t *key = NULL;
> >> +Error *local_err = NULL;
> >> +
> >> +algo = ldl_le_p(_req->para.algo);
> >> +keytype = ldl_le_p(_req->para.keytype);
> >> +keylen = ldl_le_p(_req->para.keylen);
> >> +
> >> +if ((keytype != VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC)
> >> + && (keytype !=
> VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE)) {
> >> +error_report("unsupported asym keytype: %d", keytype);
> >> +return -VIRTIO_CRYPTO_NOTSUPP;
> >> +}
> >> +
> >> +if (keylen) {
> >> +key = g_malloc(keylen);
> >> +if (iov_to_buf(iov, out_num, 0, key, keylen) != keylen) {
> >> +virtio_error(vdev, "virtio-crypto asym key incorrect");
> >> +return -EFAULT;
> >
> > Memory leak.
> >
> >> +}
> >> +iov_discard_front(, _num, keylen);
> >> +}
> >> +
> >> +info.op_code = opcode;
> >> +asym_info = _sess_info;
> >> +asym_info->algo = algo;
> >> +asym_info->keytype = keytype;
> >> +asym_info->keylen = keylen;
> >> +asym_info->key = key;
> >> +switch (asym_info->algo) {
> >> +case VIRTIO_CRYPTO_AKCIPHER_RSA:
> >> +asym_info->u.rsa.padding_algo =
> >> +ldl_le_p(_req->para.u.rsa.padding_algo);
> >> +asym_info->u.rsa.hash_algo =
> >> +ldl_le_p(_req->para.u.rsa.hash_algo);
> >> +break;
> >> +
> >> +/* TODO DSA handling */
> >> +
> >> +default:
> >> +return -VIRTIO_CRYPTO_ERR;
> >> +}
> >> +
> >> +queue_index = virtio_crypto_vq2q(queue_id);
> >> +session_id =
> >> + cryptodev_backend_create_session(vcrypto->cryptodev,
> >> ,
> >> + queue_index, _err);
> >> +if (session_id < 0) {
> >> +if (local_err) {
> >> +error_report_err(local_err);
> >> +}
> >> +return -VIRTIO_CRYPTO_ERR;
> >> +}
> >> +
> >> +return session_id;
> >
> > Where to free the key at both normal and exceptional paths?
> >
> 
> Hi, Lei
> 
> The key is declared with g_autofree:
> g_autofree uint8_t *key = NULL;
> 

OK. For the patch:

Reviewed-by: Gonglei 


Regards,
-Gonglei




RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm

2022-05-30 Thread Gonglei (Arei)



> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Friday, May 27, 2022 4:48 PM
> To: m...@redhat.com; Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; virtualizat...@lists.linux-foundation.org;
> helei.si...@bytedance.com; berra...@redhat.com; zhenwei pi
> 
> Subject: [PATCH v8 1/1] crypto: Introduce RSA algorithm
> 
> 
Skip...

> +static int64_t
> +virtio_crypto_create_asym_session(VirtIOCrypto *vcrypto,
> +   struct virtio_crypto_akcipher_create_session_req
> *sess_req,
> +   uint32_t queue_id, uint32_t opcode,
> +   struct iovec *iov, unsigned int out_num) {
> +VirtIODevice *vdev = VIRTIO_DEVICE(vcrypto);
> +CryptoDevBackendSessionInfo info = {0};
> +CryptoDevBackendAsymSessionInfo *asym_info;
> +int64_t session_id;
> +int queue_index;
> +uint32_t algo, keytype, keylen;
> +g_autofree uint8_t *key = NULL;
> +Error *local_err = NULL;
> +
> +algo = ldl_le_p(_req->para.algo);
> +keytype = ldl_le_p(_req->para.keytype);
> +keylen = ldl_le_p(_req->para.keylen);
> +
> +if ((keytype != VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC)
> + && (keytype != VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE)) {
> +error_report("unsupported asym keytype: %d", keytype);
> +return -VIRTIO_CRYPTO_NOTSUPP;
> +}
> +
> +if (keylen) {
> +key = g_malloc(keylen);
> +if (iov_to_buf(iov, out_num, 0, key, keylen) != keylen) {
> +virtio_error(vdev, "virtio-crypto asym key incorrect");
> +return -EFAULT;

Memory leak.

> +}
> +iov_discard_front(, _num, keylen);
> +}
> +
> +info.op_code = opcode;
> +asym_info = _sess_info;
> +asym_info->algo = algo;
> +asym_info->keytype = keytype;
> +asym_info->keylen = keylen;
> +asym_info->key = key;
> +switch (asym_info->algo) {
> +case VIRTIO_CRYPTO_AKCIPHER_RSA:
> +asym_info->u.rsa.padding_algo =
> +ldl_le_p(_req->para.u.rsa.padding_algo);
> +asym_info->u.rsa.hash_algo =
> +ldl_le_p(_req->para.u.rsa.hash_algo);
> +break;
> +
> +/* TODO DSA handling */
> +
> +default:
> +return -VIRTIO_CRYPTO_ERR;
> +}
> +
> +queue_index = virtio_crypto_vq2q(queue_id);
> +session_id = cryptodev_backend_create_session(vcrypto->cryptodev,
> ,
> + queue_index, _err);
> +if (session_id < 0) {
> +if (local_err) {
> +error_report_err(local_err);
> +}
> +return -VIRTIO_CRYPTO_ERR;
> +}
> +
> +return session_id;

Where to free the key at both normal and exceptional paths?


Regards,
-Gonglei





RE: [PATCH 9/9] crypto: Introduce RSA algorithm

2022-05-26 Thread Gonglei (Arei)



> -Original Message-
> From: Lei He [mailto:helei.si...@bytedance.com]
> Sent: Wednesday, May 25, 2022 5:01 PM
> To: m...@redhat.com; Gonglei (Arei) ;
> berra...@redhat.com
> Cc: qemu-devel@nongnu.org; virtualizat...@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; jasow...@redhat.com; coh...@redhat.com;
> pizhen...@bytedance.com; helei.si...@bytedance.com
> Subject: [PATCH 9/9] crypto: Introduce RSA algorithm
> 
> From: zhenwei pi 
> 
> There are two parts in this patch:
> 1, support akcipher service by cryptodev-builtin driver 2, virtio-crypto 
> driver
> supports akcipher service
> 
> In principle, we should separate this into two patches, to avoid compiling 
> error,
> merge them into one.
> 
> Then virtio-crypto gets request from guest side, and forwards the request to
> builtin driver to handle it.
> 
> Test with a guest linux:
> 1, The self-test framework of crypto layer works fine in guest kernel 2, Test
> with Linux guest(with asym support), the following script test(note that
> pkey_XXX is supported only in a newer version of keyutils):
>   - both public key & private key
>   - create/close session
>   - encrypt/decrypt/sign/verify basic driver operation
>   - also test with kernel crypto layer(pkey add/query)
> 
> All the cases work fine.
> 
> Run script in guest:
> rm -rf *.der *.pem *.pfx
> modprobe pkcs8_key_parser # if CONFIG_PKCS8_PRIVATE_KEY_PARSER=m rm
> -rf /tmp/data dd if=/dev/random of=/tmp/data count=1 bs=20
> 
> openssl req -nodes -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -subj
> "/C=CN/ST=BJ/L=HD/O=qemu/OU=dev/CN=qemu/emailAddress=qemu@qemu
> .org"
> openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER -out key.der openssl
> x509 -in cert.pem -inform PEM -outform DER -out cert.der
> 
> PRIV_KEY_ID=`cat key.der | keyctl padd asymmetric test_priv_key @s` echo
> "priv key id = "$PRIV_KEY_ID PUB_KEY_ID=`cat cert.der | keyctl padd
> asymmetric test_pub_key @s` echo "pub key id = "$PUB_KEY_ID
> 
> keyctl pkey_query $PRIV_KEY_ID 0
> keyctl pkey_query $PUB_KEY_ID 0
> 
> echo "Enc with priv key..."
> keyctl pkey_encrypt $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.priv echo
> "Dec with pub key..."
> keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.priv enc=pkcs1 >/tmp/dec cmp
> /tmp/data /tmp/dec
> 
> echo "Sign with priv key..."
> keyctl pkey_sign $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 hash=sha1 > /tmp/sig
> echo "Verify with pub key..."
> keyctl pkey_verify $PRIV_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
> 
> echo "Enc with pub key..."
> keyctl pkey_encrypt $PUB_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.pub echo
> "Dec with priv key..."
> keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.pub enc=pkcs1 >/tmp/dec cmp
> /tmp/data /tmp/dec
> 
> echo "Verify with pub key..."
> keyctl pkey_verify $PUB_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1
> 
> Signed-off-by: zhenwei pi 
> Signed-off-by: lei he  ---
>  backends/cryptodev-builtin.c  | 272
> +++-
>  backends/cryptodev-vhost-user.c   |  34 +++-
>  backends/cryptodev.c  |  32 ++--
>  hw/virtio/virtio-crypto.c | 323
> ++
>  include/hw/virtio/virtio-crypto.h |   5 +-
>  include/sysemu/cryptodev.h|  83 --
>  6 files changed, 604 insertions(+), 145 deletions(-)
> 
> diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c index
> 0671bf9f3e..388aedd8df 100644
> --- a/backends/cryptodev-builtin.c
> +++ b/backends/cryptodev-builtin.c
> @@ -26,6 +26,7 @@
>  #include "qapi/error.h"
>  #include "standard-headers/linux/virtio_crypto.h"
>  #include "crypto/cipher.h"
> +#include "crypto/akcipher.h"
>  #include "qom/object.h"
> 
> 
> @@ -41,11 +42,12 @@
> OBJECT_DECLARE_SIMPLE_TYPE(CryptoDevBackendBuiltin,
> CRYPTODEV_BACKEND_BUILTIN)  typedef struct
> CryptoDevBackendBuiltinSession {
>  QCryptoCipher *cipher;
>  uint8_t direction; /* encryption or decryption */
> -uint8_t type; /* cipher? hash? aead? */
> +uint8_t type; /* cipher? hash? aead? akcipher? */

Do you actually use the type for akcipher?

> +QCryptoAkCipher *akcipher;
>  QTAILQ_ENTRY(CryptoDevBackendBuiltinSession) next;  }
> CryptoDevBackendBuiltinSession;
> 
> -/* Max number of symmetric sessions */
> +/* Max number of symmetric/asymmetric sessions */
>  #define MAX_NUM_SESSIONS 256
> 
>  #define CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN512
> @@ -80,15 +82,17 @@ static void cryptodev_builtin_init(
>  backend-

RE: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto

2022-05-26 Thread Gonglei (Arei)


> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Thursday, May 26, 2022 6:48 PM
> To: Lei He 
> Cc: m...@redhat.com; Gonglei (Arei) ;
> qemu-devel@nongnu.org; virtualizat...@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; jasow...@redhat.com; coh...@redhat.com;
> pizhen...@bytedance.com
> Subject: Re: [PATCH v7 0/9] Introduce akcipher service for virtio-crypto
> 
> I've sent a pull request containing all the crypto/ changes, as that covers 
> stuff I
> maintain. ie patches 2-8
> 
> Patches 1 and 9, I'll leave for MST to review & queue since the virtual 
> hardware
> is not my area of knowledge.
> 

Thanks for your work, Daniel.

Regards,
-Gonglei

> On Wed, May 25, 2022 at 05:01:09PM +0800, Lei He wrote:
> > v6 -> v7:
> > - Fix serval build errors for some specific platforms/configurations.
> > - Use '%zu' instead of '%lu' for size_t parameters.
> > - AkCipher-gcrypt: avoid setting wrong error messages when parsing RSA
> >   keys.
> > - AkCipher-benchmark: process constant amount of sign/verify instead
> > of running sign/verify for a constant duration.
> >
> > v5 -> v6:
> > - Fix build errors and codestyles.
> > - Add parameter 'Error **errp' for qcrypto_akcipher_rsakey_parse.
> > - Report more detailed errors.
> > - Fix buffer length check and return values of akcipher-nettle, allows
> > caller to  pass a buffer with larger size than actual needed.
> >
> > A million thanks to Daniel!
> >
> > v4 -> v5:
> > - Move QCryptoAkCipher into akcipherpriv.h, and modify the related
> comments.
> > - Rename asn1_decoder.c to der.c.
> > - Code style fix: use 'cleanup' & 'error' lables.
> > - Allow autoptr type to auto-free.
> > - Add test cases for rsakey to handle DER error.
> > - Other minor fixes.
> >
> > v3 -> v4:
> > - Coding style fix: Akcipher -> AkCipher, struct XXX -> XXX, Rsa ->
> > RSA, XXX-alg -> XXX-algo.
> > - Change version info in qapi/crypto.json, from 7.0 -> 7.1.
> > - Remove ecdsa from qapi/crypto.json, it would be introduced with the
> implemetion later.
> > - Use QCryptoHashAlgothrim instead of QCryptoRSAHashAlgorithm(removed)
> in qapi/crypto.json.
> > - Rename arguments of qcrypto_akcipher_XXX to keep aligned with
> qcrypto_cipher_XXX(dec/enc/sign/vefiry -> in/out/in2), and add
> qcrypto_akcipher_max_XXX APIs.
> > - Add new API: qcrypto_akcipher_supports.
> > - Change the return value of qcrypto_akcipher_enc/dec/sign, these functions
> return the actual length of result.
> > - Separate ASN.1 source code and test case clean.
> > - Disable RSA raw encoding for akcipher-nettle.
> > - Separate RSA key parser into rsakey.{hc}, and implememts it with
> builtin-asn1-decoder and nettle respectivly.
> > - Implement RSA(pkcs1 and raw encoding) algorithm by gcrypt. This has
> higher priority than nettle.
> > - For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the
> > length of returned result maybe less than the dst buffer size, return
> > the actual length of result instead of the buffer length to the guest
> > side. (in function virtio_crypto_akcipher_input_data_helper)
> > - Other minor changes.
> >
> > Thanks to Daniel!
> >
> > Eric pointed out this missing part of use case, send it here again.
> >
> > In our plan, the feature is designed for HTTPS offloading case and other
> applications which use kernel RSA/ecdsa by keyctl syscall. The full picture
> shows bellow:
> >
> >
> >  Nginx/openssl[1] ... Apps
> > Guest   -
> >   virtio-crypto driver[2]
> > -
> >   virtio-crypto backend[3]
> > Host-
> >  /  |  \
> >  builtin[4]   vhost keyctl[5] ...
> >
> >
> > [1] User applications can offload RSA calculation to kernel by keyctl 
> > syscall.
> There is no keyctl engine in openssl currently, we developed a engine and 
> tried
> to contribute it to openssl upstream, but openssl 1.x does not accept new
> feature. Link:
> >https://github.com/openssl/openssl/pull/16689
> >
> > This branch is available and maintained by Lei 
> >
> > https://github.com/TousakaRin/openssl/tree/OpenSSL_1_1_1-kctl_engine
> >
> > We tested nginx(change config file only) with openssl keyctl engine, it 
> > works
> fine.
> >
> > [2] virtio-crypto driver is used to

RE: [PATCH v2 1/3] virtio-crypto: header update

2022-02-17 Thread Gonglei (Arei)



> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Friday, February 11, 2022 4:44 PM
> To: Gonglei (Arei) ; m...@redhat.com
> Cc: jasow...@redhat.com; virtualizat...@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; qemu-devel@nongnu.org;
> helei.si...@bytedance.com; herb...@gondor.apana.org.au; zhenwei pi
> 
> Subject: [PATCH v2 1/3] virtio-crypto: header update
> 
> Update header from linux, support akcipher service.
> 
> Signed-off-by: lei he 
> Signed-off-by: zhenwei pi 
> ---
>  .../standard-headers/linux/virtio_crypto.h| 82 ++-
>  1 file changed, 81 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gonglei 


> diff --git a/include/standard-headers/linux/virtio_crypto.h
> b/include/standard-headers/linux/virtio_crypto.h
> index 5ff0b4ee59..68066dafb6 100644
> --- a/include/standard-headers/linux/virtio_crypto.h
> +++ b/include/standard-headers/linux/virtio_crypto.h
> @@ -37,6 +37,7 @@
>  #define VIRTIO_CRYPTO_SERVICE_HASH   1
>  #define VIRTIO_CRYPTO_SERVICE_MAC2
>  #define VIRTIO_CRYPTO_SERVICE_AEAD   3
> +#define VIRTIO_CRYPTO_SERVICE_AKCIPHER 4
> 
>  #define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
> 
> @@ -57,6 +58,10 @@ struct virtio_crypto_ctrl_header {
>  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> #define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
>  VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> +#define VIRTIO_CRYPTO_AKCIPHER_CREATE_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x04)
> #define
> +VIRTIO_CRYPTO_AKCIPHER_DESTROY_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER,
> 0x05)
>   uint32_t opcode;
>   uint32_t algo;
>   uint32_t flag;
> @@ -180,6 +185,58 @@ struct virtio_crypto_aead_create_session_req {
>   uint8_t padding[32];
>  };
> 
> +struct virtio_crypto_rsa_session_para {
> +#define VIRTIO_CRYPTO_RSA_RAW_PADDING   0
> +#define VIRTIO_CRYPTO_RSA_PKCS1_PADDING 1
> + uint32_t padding_algo;
> +
> +#define VIRTIO_CRYPTO_RSA_NO_HASH   0
> +#define VIRTIO_CRYPTO_RSA_MD2   1
> +#define VIRTIO_CRYPTO_RSA_MD3   2
> +#define VIRTIO_CRYPTO_RSA_MD4   3
> +#define VIRTIO_CRYPTO_RSA_MD5   4
> +#define VIRTIO_CRYPTO_RSA_SHA1  5
> +#define VIRTIO_CRYPTO_RSA_SHA2566
> +#define VIRTIO_CRYPTO_RSA_SHA3847
> +#define VIRTIO_CRYPTO_RSA_SHA5128
> +#define VIRTIO_CRYPTO_RSA_SHA2249
> + uint32_t hash_algo;
> +};
> +
> +struct virtio_crypto_ecdsa_session_para {
> +#define VIRTIO_CRYPTO_CURVE_UNKNOWN   0
> +#define VIRTIO_CRYPTO_CURVE_NIST_P192 1 #define
> +VIRTIO_CRYPTO_CURVE_NIST_P224 2 #define
> VIRTIO_CRYPTO_CURVE_NIST_P256 3
> +#define VIRTIO_CRYPTO_CURVE_NIST_P384 4 #define
> +VIRTIO_CRYPTO_CURVE_NIST_P521 5
> + uint32_t curve_id;
> + uint32_t padding;
> +};
> +
> +struct virtio_crypto_akcipher_session_para {
> +#define VIRTIO_CRYPTO_NO_AKCIPHER0
> +#define VIRTIO_CRYPTO_AKCIPHER_RSA   1
> +#define VIRTIO_CRYPTO_AKCIPHER_DSA   2
> +#define VIRTIO_CRYPTO_AKCIPHER_ECDSA 3
> + uint32_t algo;
> +
> +#define VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC  1 #define
> +VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE 2
> + uint32_t keytype;
> + uint32_t keylen;
> +
> + union {
> + struct virtio_crypto_rsa_session_para rsa;
> + struct virtio_crypto_ecdsa_session_para ecdsa;
> + } u;
> +};
> +
> +struct virtio_crypto_akcipher_create_session_req {
> + struct virtio_crypto_akcipher_session_para para;
> + uint8_t padding[36];
> +};
> +
>  struct virtio_crypto_alg_chain_session_para {  #define
> VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER  1
> #define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH  2
> @@ -247,6 +304,8 @@ struct virtio_crypto_op_ctrl_req {
>   mac_create_session;
>   struct virtio_crypto_aead_create_session_req
>   aead_create_session;
> + struct virtio_crypto_akcipher_create_session_req
> + akcipher_create_session;
>   struct virtio_crypto_destroy_session_req
>   destroy_session;
>   uint8_t padding[56];
> @@ -266,6 +325,14 @@ struct virtio_crypto_op_header {
>   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> #define VIRTIO_CRYPTO_AEAD_DECRYPT \
>   VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> +#define VIRTIO_CRYPTO_AKCIPHER_ENCRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AKCIPHER, 0x00)
> #define
> +VIRTIO_CRYPTO_AKCIPHER_DECRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_

RE: [PATCH v2] MAINTAINERS: Change my email address

2021-12-14 Thread Gonglei (Arei)


> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Tuesday, December 14, 2021 5:22 PM
> To: Philippe Mathieu-Daudé 
> Cc: Hailiang Zhang ;
> qemu-devel@nongnu.org; Gonglei (Arei) ;
> Wencongyang (HongMeng) ;
> dgilb...@redhat.com; quint...@redhat.com
> Subject: Re: [PATCH v2] MAINTAINERS: Change my email address
> 
> On Tue, Dec 14, 2021 at 10:04:03AM +0100, Philippe Mathieu-Daudé
> wrote:
> > On 12/14/21 08:54, Hailiang Zhang wrote:
> > > The zhang.zhanghaili...@huawei.com email address has been
> stopped.
> > > Change it to my new email address.
> > >
> > > Signed-off-by: Hailiang Zhang 
> > > ---
> > > hi Juan & Dave,
> > >
> > > Firstly, thank you for your working on maintaining the COLO
> framework.
> > > I didn't have much time on it in the past days.
> > >
> > > I may have some time in the next days since my job has changed.
> > >
> > > Because of my old email being stopped, i can not use it to send this
> patch.
> > > Please help me to merge this patch.
> >
> > Can we have an Ack-by from someone working at Huawei?
> 
> Why do we need that ? Subsystems are not owned by companies.
> 
> If someone moves company and wants to carry on in their existing role as
> maintainer that is fine and doesn't need approva from their old company
> IMHO.
> 

Agreed. I'm just confirming HaiLiang's identity. 

Acked-by: Gonglei 

Good luck, bro. @Hailiang

Thanks,
-Gonglei

> Regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|



RE: [PATCH 01/24] cryptodev-vhost-user: Register "chardev" as class property

2020-09-21 Thread Gonglei (Arei)



> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Tuesday, September 22, 2020 6:10 AM
> To: qemu-devel@nongnu.org
> Cc: Paolo Bonzini ; Daniel P. Berrange
> ; John Snow ; Gonglei (Arei)
> 
> Subject: [PATCH 01/24] cryptodev-vhost-user: Register "chardev" as class
> property
> 
> Class properties make QOM introspection simpler and easier, as they don't
> require an object to be instantiated.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> Cc: "Gonglei (Arei)" 
> Cc: qemu-devel@nongnu.org
> ---

Reviewed-by: Gonglei 

Regards,
-Gonglei

>  backends/cryptodev-vhost-user.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/backends/cryptodev-vhost-user.c
> b/backends/cryptodev-vhost-user.c index 41089dede15..690738c6c95
> 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -336,13 +336,6 @@ cryptodev_vhost_user_get_chardev(Object *obj,
> Error **errp)
>  return NULL;
>  }
> 
> -static void cryptodev_vhost_user_instance_int(Object *obj) -{
> -object_property_add_str(obj, "chardev",
> -cryptodev_vhost_user_get_chardev,
> -cryptodev_vhost_user_set_chardev);
> -}
> -
>  static void cryptodev_vhost_user_finalize(Object *obj)  {
>  CryptoDevBackendVhostUser *s =
> @@ -363,13 +356,17 @@ cryptodev_vhost_user_class_init(ObjectClass *oc,
> void *data)
>  bc->create_session = cryptodev_vhost_user_sym_create_session;
>  bc->close_session = cryptodev_vhost_user_sym_close_session;
>  bc->do_sym_op = NULL;
> +
> +object_class_property_add_str(oc, "chardev",
> +  cryptodev_vhost_user_get_chardev,
> +  cryptodev_vhost_user_set_chardev);
> +
>  }
> 
>  static const TypeInfo cryptodev_vhost_user_info = {
>  .name = TYPE_CRYPTODEV_BACKEND_VHOST_USER,
>  .parent = TYPE_CRYPTODEV_BACKEND,
>  .class_init = cryptodev_vhost_user_class_init,
> -.instance_init = cryptodev_vhost_user_instance_int,
>  .instance_finalize = cryptodev_vhost_user_finalize,
>  .instance_size = sizeof(CryptoDevBackendVhostUser),
>  };
> --
> 2.26.2




RE: [PATCH 02/24] cryptodev-backend: Register "chardev" as class property

2020-09-21 Thread Gonglei (Arei)



> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Tuesday, September 22, 2020 6:10 AM
> To: qemu-devel@nongnu.org
> Cc: Paolo Bonzini ; Daniel P. Berrange
> ; John Snow ; Gonglei (Arei)
> 
> Subject: [PATCH 02/24] cryptodev-backend: Register "chardev" as class
> property
> 
> Class properties make QOM introspection simpler and easier, as they don't
> require an object to be instantiated.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> Cc: "Gonglei (Arei)" 
> Cc: qemu-devel@nongnu.org
> ---
>  backends/cryptodev.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

Reviewed-by: Gonglei 

Regards,
-Gonglei


> diff --git a/backends/cryptodev.c b/backends/cryptodev.c index
> ada4ebe78b1..3f141f61ed6 100644
> --- a/backends/cryptodev.c
> +++ b/backends/cryptodev.c
> @@ -206,10 +206,6 @@ cryptodev_backend_can_be_deleted(UserCreatable
> *uc)
> 
>  static void cryptodev_backend_instance_init(Object *obj)  {
> -object_property_add(obj, "queues", "uint32",
> -  cryptodev_backend_get_queues,
> -  cryptodev_backend_set_queues,
> -  NULL, NULL);
>  /* Initialize devices' queues property to 1 */
>  object_property_set_int(obj, "queues", 1, NULL);  } @@ -230,6 +226,10
> @@ cryptodev_backend_class_init(ObjectClass *oc, void *data)
>  ucc->can_be_deleted = cryptodev_backend_can_be_deleted;
> 
>  QTAILQ_INIT(_clients);
> +object_class_property_add(oc, "queues", "uint32",
> +  cryptodev_backend_get_queues,
> +  cryptodev_backend_set_queues,
> +  NULL, NULL);
>  }
> 
>  static const TypeInfo cryptodev_backend_info = {
> --
> 2.26.2




RE: [PATCH 05/46] virtio-crypto-pci: Tidy up virtio_crypto_pci_realize()

2020-06-27 Thread Gonglei (Arei)


> -Original Message-
> From: Markus Armbruster [mailto:arm...@redhat.com]
> Sent: Thursday, June 25, 2020 12:43 AM
> To: qemu-devel@nongnu.org
> Cc: pbonz...@redhat.com; berra...@redhat.com; ehabk...@redhat.com;
> qemu-bl...@nongnu.org; peter.mayd...@linaro.org;
> vsement...@virtuozzo.com; Gonglei (Arei) ;
> Michael S . Tsirkin 
> Subject: [PATCH 05/46] virtio-crypto-pci: Tidy up virtio_crypto_pci_realize()
> 
> virtio_crypto_pci_realize() continues after realization of its 
> "virtio-crypto-device"
> fails.  Only an object_property_set_link() follows; looks harmless to me.  
> Tidy
> up anyway: return after failure, just like virtio_rng_pci_realize() does.
> 
> Cc: "Gonglei (Arei)" 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Markus Armbruster 
> ---
>  hw/virtio/virtio-crypto-pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gonglei < arei.gong...@huawei.com>

> diff --git a/hw/virtio/virtio-crypto-pci.c b/hw/virtio/virtio-crypto-pci.c 
> index
> 72be531c95..0755722288 100644
> --- a/hw/virtio/virtio-crypto-pci.c
> +++ b/hw/virtio/virtio-crypto-pci.c
> @@ -54,7 +54,9 @@ static void virtio_crypto_pci_realize(VirtIOPCIProxy
> *vpci_dev, Error **errp)
>  }
> 
>  virtio_pci_force_virtio_1(vpci_dev);
> -qdev_realize(vdev, BUS(_dev->bus), errp);
> +if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
> +return;
> +}
>  object_property_set_link(OBJECT(vcrypto),
>   OBJECT(vcrypto->vdev.conf.cryptodev), "cryptodev",
>   NULL);
> --
> 2.26.2




RE: [PATCH v1 29/59] cryptodev-vhost.c: remove unneeded 'err' label in cryptodev_vhost_start

2020-01-07 Thread Gonglei (Arei)


> -Original Message-
> From: Daniel Henrique Barboza [mailto:danielhb...@gmail.com]
> Sent: Tuesday, January 7, 2020 2:24 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-triv...@nongnu.org; Daniel Henrique Barboza
> ; Gonglei (Arei) 
> Subject: [PATCH v1 29/59] cryptodev-vhost.c: remove unneeded 'err' label in
> cryptodev_vhost_start
> 
> 'err' can be replaced by 'return r'.
> 
> CC: Gonglei 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  backends/cryptodev-vhost.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: Gonglei 


> diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c index
> 8337c9a495..907ca21fa7 100644
> --- a/backends/cryptodev-vhost.c
> +++ b/backends/cryptodev-vhost.c
> @@ -201,7 +201,7 @@ int cryptodev_vhost_start(VirtIODevice *dev, int
> total_queues)
>  r = k->set_guest_notifiers(qbus->parent, total_queues, true);
>  if (r < 0) {
>  error_report("error binding guest notifier: %d", -r);
> -goto err;
> +return r;
>  }
> 
>  for (i = 0; i < total_queues; i++) { @@ -236,7 +236,7 @@ err_start:
>  if (e < 0) {
>  error_report("vhost guest notifier cleanup failed: %d", e);
>  }
> -err:
> +
>  return r;
>  }
> 
> --
> 2.24.1




RE: [PATCH v6] backends/cryptodev: drop local_err from cryptodev_backend_complete()

2019-11-27 Thread Gonglei (Arei)
CCing qemu-triv...@nongnu.org

Reviewed-by: Gonglei 


Regards,
-Gonglei

> -Original Message-
> From: Vladimir Sementsov-Ogievskiy [mailto:vsement...@virtuozzo.com]
> Sent: Thursday, November 28, 2019 3:46 AM
> To: qemu-devel@nongnu.org
> Cc: Gonglei (Arei) ; marcandre.lur...@gmail.com;
> phi...@redhat.com; vsement...@virtuozzo.com
> Subject: [PATCH v6] backends/cryptodev: drop local_err from
> cryptodev_backend_complete()
> 
> No reason for local_err here, use errp directly instead.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Marc-André Lureau 
> ---
> 
> v6: add r-b by Philippe and Marc-André
> 
>  backends/cryptodev.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/backends/cryptodev.c b/backends/cryptodev.c index
> 3c071eab95..5a9735684e 100644
> --- a/backends/cryptodev.c
> +++ b/backends/cryptodev.c
> @@ -176,19 +176,10 @@ cryptodev_backend_complete(UserCreatable *uc,
> Error **errp)  {
>  CryptoDevBackend *backend = CRYPTODEV_BACKEND(uc);
>  CryptoDevBackendClass *bc = CRYPTODEV_BACKEND_GET_CLASS(uc);
> -Error *local_err = NULL;
> 
>  if (bc->init) {
> -bc->init(backend, _err);
> -if (local_err) {
> -goto out;
> -}
> +bc->init(backend, errp);
>  }
> -
> -return;
> -
> -out:
> -error_propagate(errp, local_err);
>  }
> 
>  void cryptodev_backend_set_used(CryptoDevBackend *backend, bool used)
> --
> 2.21.0



Re: [Qemu-devel] [PATCH] backends: cryptodev: fix oob access issue

2019-03-17 Thread Gonglei (Arei)
Hi Michael,

Could you pls apply this patch in your tree?

Thanks,
-Gonglei


> -Original Message-
> From: Li Qiang [mailto:liq...@163.com]
> Sent: Monday, March 18, 2019 9:12 AM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; Li Qiang 
> Subject: [PATCH] backends: cryptodev: fix oob access issue
> 
> The 'queue_index' of create/close_session function
> is from guest and can be exceed 'MAX_CRYPTO_QUEUE_NUM'.
> This leads oob access. This patch avoid this.
> 
> Signed-off-by: Li Qiang 
> ---
>  backends/cryptodev-builtin.c| 4 
>  backends/cryptodev-vhost-user.c | 4 
>  2 files changed, 8 insertions(+)
> 

Reviewed-by: Gonglei 


> diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c
> index 9fb0bd57a6..c3a65b2f5f 100644
> --- a/backends/cryptodev-builtin.c
> +++ b/backends/cryptodev-builtin.c
> @@ -249,6 +249,8 @@ static int64_t cryptodev_builtin_sym_create_session(
> CryptoDevBackendSymSessionInfo *sess_info,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendBuiltin *builtin =
>CRYPTODEV_BACKEND_BUILTIN(backend);
>  int64_t session_id = -1;
> @@ -280,6 +282,8 @@ static int cryptodev_builtin_sym_close_session(
> uint64_t session_id,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendBuiltin *builtin =
>CRYPTODEV_BACKEND_BUILTIN(backend);
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> index 1052a5d0e9..36a40eeb4d 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -236,6 +236,8 @@ static int64_t
> cryptodev_vhost_user_sym_create_session(
> CryptoDevBackendSymSessionInfo *sess_info,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendClient *cc =
> backend->conf.peers.ccs[queue_index];
>  CryptoDevBackendVhost *vhost_crypto;
> @@ -262,6 +264,8 @@ static int cryptodev_vhost_user_sym_close_session(
> uint64_t session_id,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendClient *cc =
>backend->conf.peers.ccs[queue_index];
>  CryptoDevBackendVhost *vhost_crypto;
> --
> 2.17.1
> 




Re: [Qemu-devel] [PATCH] cryptodev-vhost-user: fix a oob access

2019-03-17 Thread Gonglei (Arei)
Hi,

> -Original Message-
> From: Li Qiang [mailto:liq...@163.com]
> Sent: Sunday, March 17, 2019 5:10 PM
> To: Gonglei (Arei) 
> Cc: qemu-devel@nongnu.org; Li Qiang 
> Subject: [PATCH] cryptodev-vhost-user: fix a oob access
> 
> The 'queue_index' of create/close_session function
> is from guest and can be exceed 'MAX_CRYPTO_QUEUE_NUM'.
> This leads oob access. This patch avoid this.
> 
> Signed-off-by: Li Qiang 
> ---
>  backends/cryptodev-vhost-user.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> index 1052a5d0e9..36a40eeb4d 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -236,6 +236,8 @@ static int64_t
> cryptodev_vhost_user_sym_create_session(
> CryptoDevBackendSymSessionInfo *sess_info,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendClient *cc =
> backend->conf.peers.ccs[queue_index];
>  CryptoDevBackendVhost *vhost_crypto;
> @@ -262,6 +264,8 @@ static int cryptodev_vhost_user_sym_close_session(
> uint64_t session_id,
> uint32_t queue_index, Error **errp)
>  {
> +assert(queue_index < MAX_CRYPTO_QUEUE_NUM);
> +
>  CryptoDevBackendClient *cc =
>backend->conf.peers.ccs[queue_index];
>  CryptoDevBackendVhost *vhost_crypto;
> --
> 2.17.1
> 

Pls add an assertion for cryptodev-builtin backend though the queue_index 
isn't used currently.

Thanks,
-Gonglei




Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 10:05 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > > >
> > > > > > 5) About log sync, why not register log_global_start/stop in
> > > > > vfio_memory_listener?
> > > > > >
> > > > > >
> > > > > seems log_global_start/stop cannot be iterately called in pre-copy
> phase?
> > > > > for dirty pages in system memory, it's better to transfer dirty data
> > > > > iteratively to reduce down time, right?
> > > > >
> > > >
> > > > We just need invoking only once for start and stop logging. Why we need
> to
> > > call
> > > > them literately? See memory_listener of vhost.
> > > >
> > > the dirty pages in system memory produces by device is incremental.
> > > if it can be got iteratively, the dirty pages in stop-and-copy phase can 
> > > be
> > > minimal.
> > > :)
> > >
> > I mean starting or stopping the capability of logging, not log sync.
> >
> > We register the below callbacks:
> >
> > .log_sync = vfio_log_sync,
> > .log_global_start = vfio_log_global_start,
> > .log_global_stop = vfio_log_global_stop,
> >
> .log_global_start is also a good point to notify logging state.
> But if notifying in .save_setup handler, we can do fine-grained
> control of when to notify of logging starting together with get_buffer
> operation.
> Is there any special benifit by registering to .log_global_start/stop?
> 

Performance benefit when one VM has multiple same vfio devices.


Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)







> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 12:08 PM
> To: Gonglei (Arei) 
> Cc: c...@nvidia.com; k...@vger.kernel.org; a...@ozlabs.ru;
> zhengxiao...@alibaba-inc.com; shuangtai@alibaba-inc.com;
> qemu-devel@nongnu.org; kwankh...@nvidia.com; eau...@redhat.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> mlevi...@redhat.com; pa...@linux.ibm.com; fel...@nutanix.com;
> ken@amd.com; kevin.t...@intel.com; dgilb...@redhat.com;
> alex.william...@redhat.com; intel-gvt-...@lists.freedesktop.org;
> changpeng@intel.com; coh...@redhat.com; zhi.a.w...@intel.com;
> jonathan.dav...@nutanix.com
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Thu, Feb 21, 2019 at 03:33:24AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 9:59 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > > > Sent: Thursday, February 21, 2019 8:25 AM
> > > > > To: Gonglei (Arei) 
> > > > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> > > dgilb...@redhat.com;
> > > > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> > > ken@amd.com;
> > > > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > > > k...@vger.kernel.org
> > > > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > > > >
> > > > > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > > > > Hi yan,
> > > > > >
> > > > > > Thanks for your work.
> > > > > >
> > > > > > I have some suggestions or questions:
> > > > > >
> > > > > > 1) Would you add msix mode support,? if not, pls add a check in
> > > > > vfio_pci_save_config(), likes Nvidia's solution.
> > > > > ok.
> > > > >
> > > > > > 2) We should start vfio devices before vcpu resumes, so we can't 
> > > > > > rely
> on
> > > vm
> > > > > start change handler completely.
> > > > > vfio devices is by default set to running state.
> > > > > In the target machine, its state transition flow is
> running->stop->running.
> > > >
> > > > That's confusing. We should start vfio devices after vfio_load_state,
> > > otherwise
> > > > how can you keep the devices' information are the same between source
> side
> > > > and destination side?
> > > >
> > > so, your meaning is to set device state to running in the first call to
> > > vfio_load_state?
> > >
> > No, it should start devices after vfio_load_state and before vcpu resuming.
> >
> 
> What about set device state to running in load_cleanup handler ?
> 

The timing is fine, but you should also think about if should set device state 
to running in failure branches when calling load_cleanup handler.

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)


> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 9:59 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> >
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 8:25 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > > Hi yan,
> > > >
> > > > Thanks for your work.
> > > >
> > > > I have some suggestions or questions:
> > > >
> > > > 1) Would you add msix mode support,? if not, pls add a check in
> > > vfio_pci_save_config(), likes Nvidia's solution.
> > > ok.
> > >
> > > > 2) We should start vfio devices before vcpu resumes, so we can't rely on
> vm
> > > start change handler completely.
> > > vfio devices is by default set to running state.
> > > In the target machine, its state transition flow is 
> > > running->stop->running.
> >
> > That's confusing. We should start vfio devices after vfio_load_state,
> otherwise
> > how can you keep the devices' information are the same between source side
> > and destination side?
> >
> so, your meaning is to set device state to running in the first call to
> vfio_load_state?
> 
No, it should start devices after vfio_load_state and before vcpu resuming.

> > > so, maybe you can ignore the stop notification in kernel?
> > > > 3) We'd better support live migration rollback since have many failure
> > > scenarios,
> > > >  register a migration notifier is a good choice.
> > > I think this patchset can also handle the failure case well.
> > > if migration failure or cancelling happens,
> > > in cleanup handler, LOGGING state is cleared. device state(running or
> > > stopped) keeps as it is).
> >
> > IIRC there're many failure paths don't calling cleanup handler.
> >
> could you take an example?

Never mind, that's another bug I think. 

> > > then,
> > > if vm switches back to running, device state will be set to running;
> > > if vm stayes at stopped state, device state is also stopped (it has no
> > > meaning to let it in running state).
> > > Do you think so ?
> > >
> > IF the underlying state machine is complicated,
> > We should tell the canceling state to vendor driver proactively.
> >
> That makes sense.
> 
> > > > 4) Four memory region for live migration is too complicated IMHO.
> > > one big region requires the sub-regions well padded.
> > > like for the first control fields, they have to be padded to 4K.
> > > the same for other data fields.
> > > Otherwise, mmap simply fails, because the start-offset and size for mmap
> > > both need to be PAGE aligned.
> > >
> > But if we don't need use mmap for control filed and device state, they are
> small basically.
> > The performance is enough using pread/pwrite.
> >
> we don't mmap control fields. but if data fields going immedately after
> control fields (e.g. just 64 bytes), we can't mmap data fields
> successfully because its start offset is 64. Therefore control fields have
> to be padded to 4k to let data fields start from 4k.
> That's the drawback of one big region holding both control and data fields.
> 
> > > Also, 4 regions is clearer in my view :)
> > >
> > > > 5) About log sync, why not register log_global_start/stop in
> > > vfio_memory_listener?
> > > >
> > > >
> > > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > > for dirty pages in system memory, it's better to transfer dirty data
> > > iteratively to reduce down time, right?
> > >
> >
> > We just need invoking only once for start and stop logging. Why we need to
> call
> > them literately? See memory_listener of vhost.
> >
> 
> 
> 
> > Regards,
> > -Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)




> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 10:05 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> > >
> > > > 5) About log sync, why not register log_global_start/stop in
> > > vfio_memory_listener?
> > > >
> > > >
> > > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > > for dirty pages in system memory, it's better to transfer dirty data
> > > iteratively to reduce down time, right?
> > >
> >
> > We just need invoking only once for start and stop logging. Why we need to
> call
> > them literately? See memory_listener of vhost.
> >
> the dirty pages in system memory produces by device is incremental.
> if it can be got iteratively, the dirty pages in stop-and-copy phase can be
> minimal.
> :)
> 
I mean starting or stopping the capability of logging, not log sync. 

We register the below callbacks:

.log_sync = vfio_log_sync,
.log_global_start = vfio_log_global_start,
.log_global_stop = vfio_log_global_stop,

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)



> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 8:25 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > Hi yan,
> >
> > Thanks for your work.
> >
> > I have some suggestions or questions:
> >
> > 1) Would you add msix mode support,? if not, pls add a check in
> vfio_pci_save_config(), likes Nvidia's solution.
> ok.
> 
> > 2) We should start vfio devices before vcpu resumes, so we can't rely on vm
> start change handler completely.
> vfio devices is by default set to running state.
> In the target machine, its state transition flow is running->stop->running.

That's confusing. We should start vfio devices after vfio_load_state, otherwise
how can you keep the devices' information are the same between source side
and destination side?

> so, maybe you can ignore the stop notification in kernel?
> > 3) We'd better support live migration rollback since have many failure
> scenarios,
> >  register a migration notifier is a good choice.
> I think this patchset can also handle the failure case well.
> if migration failure or cancelling happens,
> in cleanup handler, LOGGING state is cleared. device state(running or
> stopped) keeps as it is).

IIRC there're many failure paths don't calling cleanup handler.

> then,
> if vm switches back to running, device state will be set to running;
> if vm stayes at stopped state, device state is also stopped (it has no
> meaning to let it in running state).
> Do you think so ?
> 
IF the underlying state machine is complicated,
We should tell the canceling state to vendor driver proactively.

> > 4) Four memory region for live migration is too complicated IMHO.
> one big region requires the sub-regions well padded.
> like for the first control fields, they have to be padded to 4K.
> the same for other data fields.
> Otherwise, mmap simply fails, because the start-offset and size for mmap
> both need to be PAGE aligned.
> 
But if we don't need use mmap for control filed and device state, they are 
small basically.
The performance is enough using pread/pwrite. 

> Also, 4 regions is clearer in my view :)
> 
> > 5) About log sync, why not register log_global_start/stop in
> vfio_memory_listener?
> >
> >
> seems log_global_start/stop cannot be iterately called in pre-copy phase?
> for dirty pages in system memory, it's better to transfer dirty data
> iteratively to reduce down time, right?
> 

We just need invoking only once for start and stop logging. Why we need to call
them literately? See memory_listener of vhost.

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)



> -Original Message-
> From: Cornelia Huck [mailto:coh...@redhat.com]
> Sent: Wednesday, February 20, 2019 7:43 PM
> To: Gonglei (Arei) 
> Cc: Dr. David Alan Gilbert ; Zhao Yan
> ; c...@nvidia.com; k...@vger.kernel.org;
> a...@ozlabs.ru; zhengxiao...@alibaba-inc.com; shuangtai@alibaba-inc.com;
> qemu-devel@nongnu.org; kwankh...@nvidia.com; eau...@redhat.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> mlevi...@redhat.com; pa...@linux.ibm.com; fel...@nutanix.com;
> ken@amd.com; kevin.t...@intel.com; alex.william...@redhat.com;
> intel-gvt-...@lists.freedesktop.org; changpeng@intel.com;
> zhi.a.w...@intel.com; jonathan.dav...@nutanix.com
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Wed, 20 Feb 2019 11:28:46 +
> "Gonglei (Arei)"  wrote:
> 
> > > -Original Message-
> > > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> > > Sent: Wednesday, February 20, 2019 7:02 PM
> > > To: Zhao Yan 
> > > Cc: c...@nvidia.com; k...@vger.kernel.org; a...@ozlabs.ru;
> > > zhengxiao...@alibaba-inc.com; shuangtai@alibaba-inc.com;
> > > qemu-devel@nongnu.org; kwankh...@nvidia.com; eau...@redhat.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > mlevi...@redhat.com; pa...@linux.ibm.com; Gonglei (Arei)
> > > ; fel...@nutanix.com; ken@amd.com;
> > > kevin.t...@intel.com; alex.william...@redhat.com;
> > > intel-gvt-...@lists.freedesktop.org; changpeng@intel.com;
> > > coh...@redhat.com; zhi.a.w...@intel.com;
> jonathan.dav...@nutanix.com
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > * Zhao Yan (yan.y.z...@intel.com) wrote:
> > > > On Tue, Feb 19, 2019 at 11:32:13AM +, Dr. David Alan Gilbert wrote:
> > > > > * Yan Zhao (yan.y.z...@intel.com) wrote:
> > > > > > This patchset enables VFIO devices to have live migration 
> > > > > > capability.
> > > > > > Currently it does not support post-copy phase.
> > > > > >
> > > > > > It follows Alex's comments on last version of VFIO live migration
> patches,
> > > > > > including device states, VFIO device state region layout, dirty 
> > > > > > bitmap's
> > > > > > query.
> 
> > > > >   b) How do we detect if we're migrating from/to the wrong device or
> > > > > version of device?  Or say to a device with older firmware or perhaps
> > > > > a device that has less device memory ?
> > > > Actually it's still an open for VFIO migration. Need to think about
> > > > whether it's better to check that in libvirt or qemu (like a device 
> > > > magic
> > > > along with verion ?).
> >
> > We must keep the hardware generation is the same with one POD of public
> cloud
> > providers. But we still think about the live migration between from the the
> lower
> > generation of hardware migrated to the higher generation.
> 
> Agreed, lower->higher is the one direction that might make sense to
> support.
> 
> But regardless of that, I think we need to make sure that incompatible
> devices/versions fail directly instead of failing in a subtle, hard to
> debug way. Might be useful to do some initial sanity checks in libvirt
> as well.
> 
> How easy is it to obtain that information in a form that can be
> consumed by higher layers? Can we find out the device type at least?
> What about some kind of revision?

We can provide an interface to query if the VM support live migration or not
in prepare phase of Libvirt.

Can we get the revision_id from the vendor driver ? before invoking

register_savevm_live(NULL, TYPE_VFIO_PCI, -1,
revision_id,
_vfio_handlers,
vdev);

then limit the live migration form higher gens to lower gens?

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)
Hi yan,

Thanks for your work.

I have some suggestions or questions:

1) Would you add msix mode support,? if not, pls add a check in 
vfio_pci_save_config(), likes Nvidia's solution.
2) We should start vfio devices before vcpu resumes, so we can't rely on vm 
start change handler completely.
3) We'd better support live migration rollback since have many failure 
scenarios,
 register a migration notifier is a good choice.
4) Four memory region for live migration is too complicated IMHO. 
5) About log sync, why not register log_global_start/stop in 
vfio_memory_listener?


Regards,
-Gonglei


> -Original Message-
> From: Yan Zhao [mailto:yan.y.z...@intel.com]
> Sent: Tuesday, February 19, 2019 4:51 PM
> To: alex.william...@redhat.com; qemu-devel@nongnu.org
> Cc: intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com; Gonglei (Arei)
> ; k...@vger.kernel.org; Yan Zhao
> 
> Subject: [PATCH 0/5] QEMU VFIO live migration
> 
> This patchset enables VFIO devices to have live migration capability.
> Currently it does not support post-copy phase.
> 
> It follows Alex's comments on last version of VFIO live migration patches,
> including device states, VFIO device state region layout, dirty bitmap's
> query.
> 
> Device Data
> ---
> Device data is divided into three types: device memory, device config,
> and system memory dirty pages produced by device.
> 
> Device config: data like MMIOs, page tables...
> Every device is supposed to possess device config data.
>   Usually device config's size is small (no big than 10M), and it
> needs to be loaded in certain strict order.
> Therefore, device config only needs to be saved/loaded in
> stop-and-copy phase.
> The data of device config is held in device config region.
> Size of device config data is smaller than or equal to that of
> device config region.
> 
> Device Memory: device's internal memory, standalone and outside system
> memory. It is usually very big.
> This kind of data needs to be saved / loaded in pre-copy and
> stop-and-copy phase.
> The data of device memory is held in device memory region.
> Size of devie memory is usually larger than that of device
> memory region. qemu needs to save/load it in chunks of size of
> device memory region.
> Not all device has device memory. Like IGD only uses system memory.
> 
> System memory dirty pages: If a device produces dirty pages in system
> memory, it is able to get dirty bitmap for certain range of system
> memory. This dirty bitmap is queried in pre-copy and stop-and-copy
> phase in .log_sync callback. By setting dirty bitmap in .log_sync
> callback, dirty pages in system memory will be save/loaded by ram's
> live migration code.
> The dirty bitmap of system memory is held in dirty bitmap region.
> If system memory range is larger than that dirty bitmap region can
> hold, qemu will cut it into several chunks and get dirty bitmap in
> succession.
> 
> 
> Device State Regions
> 
> Vendor driver is required to expose two mandatory regions and another two
> optional regions if it plans to support device state management.
> 
> So, there are up to four regions in total.
> One control region: mandatory.
> Get access via read/write system call.
> Its layout is defined in struct vfio_device_state_ctl
> Three data regions: mmaped into qemu.
> device config region: mandatory, holding data of device config
> device memory region: optional, holding data of device memory
> dirty bitmap region: optional, holding bitmap of system memory
> dirty pages
> 
> (The reason why four seperate regions are defined is that the unit of mmap
> system call is PAGE_SIZE, i.e. 4k bytes. So one read/write region for
> control and three mmaped regions for data seems better than one big region
> padded and sparse mmaped).
> 
> 
> kernel device state interface [1]
> --
> #define VFIO_DEVICE_STATE_INTERFACE_VERSION 1
> #define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1
> #define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2
> 
> #define VFIO_DEVICE_STATE_RUNNING 0
> 

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)


> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Wednesday, February 20, 2019 7:02 PM
> To: Zhao Yan 
> Cc: c...@nvidia.com; k...@vger.kernel.org; a...@ozlabs.ru;
> zhengxiao...@alibaba-inc.com; shuangtai@alibaba-inc.com;
> qemu-devel@nongnu.org; kwankh...@nvidia.com; eau...@redhat.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> mlevi...@redhat.com; pa...@linux.ibm.com; Gonglei (Arei)
> ; fel...@nutanix.com; ken@amd.com;
> kevin.t...@intel.com; alex.william...@redhat.com;
> intel-gvt-...@lists.freedesktop.org; changpeng@intel.com;
> coh...@redhat.com; zhi.a.w...@intel.com; jonathan.dav...@nutanix.com
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> * Zhao Yan (yan.y.z...@intel.com) wrote:
> > On Tue, Feb 19, 2019 at 11:32:13AM +, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.z...@intel.com) wrote:
> > > > This patchset enables VFIO devices to have live migration capability.
> > > > Currently it does not support post-copy phase.
> > > >
> > > > It follows Alex's comments on last version of VFIO live migration 
> > > > patches,
> > > > including device states, VFIO device state region layout, dirty bitmap's
> > > > query.
> > >
> > > Hi,
> > >   I've sent minor comments to later patches; but some minor general
> > > comments:
> > >
> > >   a) Never trust the incoming migrations stream - it might be corrupt,
> > > so check when you can.
> > hi Dave
> > Thanks for this suggestion. I'll add more checks for migration streams.
> >
> >
> > >   b) How do we detect if we're migrating from/to the wrong device or
> > > version of device?  Or say to a device with older firmware or perhaps
> > > a device that has less device memory ?
> > Actually it's still an open for VFIO migration. Need to think about
> > whether it's better to check that in libvirt or qemu (like a device magic
> > along with verion ?).

We must keep the hardware generation is the same with one POD of public cloud
providers. But we still think about the live migration between from the the 
lower
generation of hardware migrated to the higher generation.

> > This patchset is intended to settle down the main device state interfaces
> > for VFIO migration. So that we can work on that and improve it.
> >
> >
> > >   c) Consider using the trace_ mechanism - it's really useful to
> > > add to loops writing/reading data so that you can see when it fails.
> > >
> > > Dave
> > >
> > Got it. many thanks~~
> >
> >
> > > (P.S. You have a few typo's grep your code for 'devcie', 'devie' and
> > > 'migrtion'
> >
> > sorry :)
> 
> No problem.
> 
> Given the mails, I'm guessing you've mostly tested this on graphics
> devices?  Have you also checked with VFIO network cards?
> 
> Also see the mail I sent in reply to Kirti's series; we need to boil
> these down to one solution.
> 
> Dave
> 
> > >
> > > > Device Data
> > > > ---
> > > > Device data is divided into three types: device memory, device config,
> > > > and system memory dirty pages produced by device.
> > > >
> > > > Device config: data like MMIOs, page tables...
> > > > Every device is supposed to possess device config data.
> > > > Usually device config's size is small (no big than 10M), and it
> > > > needs to be loaded in certain strict order.
> > > > Therefore, device config only needs to be saved/loaded in
> > > > stop-and-copy phase.
> > > > The data of device config is held in device config region.
> > > > Size of device config data is smaller than or equal to that of
> > > > device config region.
> > > >
> > > > Device Memory: device's internal memory, standalone and outside
> system
> > > > memory. It is usually very big.
> > > > This kind of data needs to be saved / loaded in pre-copy and
> > > > stop-and-copy phase.
> > > > The data of device memory is held in device memory region.
> > > > Size of devie memory is usually larger than that of device
> > > > memory region. qemu needs to save/load it in chunks of size of
> > > > device memory region.
> > > > Not all device has device memory. Like IGD only uses system

Re: [Qemu-devel] [PATCH] vfio: assign idstr for VFIO's mmaped regions for migration

2019-02-20 Thread Gonglei (Arei)


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Zhao Yan
> Sent: Thursday, January 10, 2019 9:19 AM
> To: Alex Williamson 
> Cc: pbonz...@redhat.com; qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH] vfio: assign idstr for VFIO's mmaped regions
> for migration
> 
> On Tue, Jan 08, 2019 at 10:09:11AM -0700, Alex Williamson wrote:
> > On Tue,  8 Jan 2019 01:03:48 -0500
> > Zhao Yan  wrote:
> >
> > > if multiple regions in vfio are mmaped, their corresponding ramblocks
> > > are like below, i.e. their idstrs are "".
> > >
> > > (qemu) info ramblock
> > > Block Name  PSize   Offset   Used
> Total
> > > pc.ram  4 KiB  0x 0x2000
> 0x2000
> > > 4 KiB  0x2110 0x2000
> 0x2000
> > > 4 KiB  0x2090 0x0080
> 0x0080
> > > 4 KiB  0x2024 0x00687000
> 0x00687000
> > > 4 KiB  0x200c 0x00178000
> 0x00178000
> > > pc.bios 4 KiB  0x2000 0x0004
> 0x0004
> > > pc.rom  4 KiB  0x2004 0x0002
> 0x0002
> > >
> > > This is because ramblocks' idstr are assigned by calling
> > > vmstate_register_ram(), but memory region of type ram device ptr does
> not
> > > call vmstate_register_ram().
> > > vfio_region_mmap
> > > |->memory_region_init_ram_device_ptr
> > >|-> memory_region_init_ram_ptr
> > >
> > > Without empty idstrs will cause problem to snapshot copying during
> > > migration, because it uses ramblocks' idstr to identify ramblocks.
> > > ram_save_setup {
> > >   …
> > >   RAMBLOCK_FOREACH(block) {
> > >   qemu_put_byte(f, strlen(block->idstr));
> > >   qemu_put_buffer(f, (uint8_t *)block->idstr,strlen(block->idstr));
> > >   qemu_put_be64(f, block->used_length);
> > >   }
> > >   …
> > > }
> > > ram_load() {
> > > block = qemu_ram_block_by_name(id);
> > > if (block) {
> > > if (length != block->used_length) {
> > > qemu_ram_resize(block, length, _err);
> > > }
> > >  ….
> > >}
> > > }
> > >
> > > Therefore, in this patch,
> > > vmstate_register_ram() is called for memory region of type ram ptr,
> > > also a unique vfioid is assigned to vfio devices across source
> > > and target vms.
> > > e.g. in source vm, use qemu parameter
> > > -device
> > > vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
> > > 882cc4da-dede-11e7-9180-078a62063ab1,vfioid=igd
> > >
> > > and in target vm, use qemu paramter
> > > -device
> > > vfio-pci,sysfsdev=/sys/bus/pci/devices/:00:02.0/
> > > 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,vfioid=igd
> >
> > Why wouldn't we just use the id= (DeviceState.id) value instead of
> > adding yet another one?  I can't imagine anyone, especially libvirt,
> > wants to deal with a vfio specific id for a device.
> >
> hi Alex
> You are right! DeviceState.id can be used here. Thanks for your suggestion.
> 
Then Libvirt and/or Nova need to keep the device id unchanged.

> 
> > > Signed-off-by: Zhao Yan 
> > > ---
> > >  hw/vfio/pci.c | 8 +++-
> > >  include/hw/vfio/vfio-common.h | 1 +
> > >  memory.c  | 4 
> > >  3 files changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index c0cb1ec289..7bc2ed0752 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2533,7 +2533,12 @@ static void vfio_populate_device(VFIOPCIDevice
> *vdev, Error **errp)
> > >  }
> > >
> > >  for (i = VFIO_PCI_BAR0_REGION_INDEX; i <
> VFIO_PCI_ROM_REGION_INDEX; i++) {
> > > -char *name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
> > > +char *name;
> > > +if (vbasedev->vfioid) {
> > > +name = g_strdup_printf("%s BAR %d", vbasedev->vfioid, i);
> > > +} else {
> > > +name = g_strdup_printf("%s BAR %d", vbasedev->name, i);
> > > +}
> > >
> > >  ret = vfio_region_setup(OBJECT(vdev), vbasedev,
> > >  >bars[i].region, i, name);
> > > @@ -3180,6 +3185,7 @@ static void vfio_instance_init(Object *obj)
> > >  static Property vfio_pci_dev_properties[] = {
> > >  DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host),
> > >  DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice,
> vbasedev.sysfsdev),
> > > +DEFINE_PROP_STRING("vfioid", VFIOPCIDevice, vbasedev.vfioid),
> > >  DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
> > >  display, ON_OFF_AUTO_OFF),
> > >  DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
> > > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > > index 1b434d02f6..84bab94f52 100644
> > > --- a/include/hw/vfio/vfio-common.h
> > > +++ 

Re: [Qemu-devel] About live migration rollback

2019-01-02 Thread Gonglei (Arei)
Hi,

> 
> * Gonglei (Arei) (arei.gong...@huawei.com) wrote:
> > Hi Dave,
> >
> > We discussed some live migration fallback scenarios in this year's KVM 
> > forum,
> > and now I can provide another scenario, perhaps the upstream should
> consider rolling
> > back for this situation.
> >
> > Environments information:
> >
> > host A: cpu E5620(model WestmereEP without flag xsave)
> > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> >
> > The reproduce steps is :
> > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> 
> Well we don't guarantee migration across -cpu host - does this problem
> go away if both qemu's are started with matching CPU flags
> (corresponding to the Westmere) ?
> 
Sorry, we didn't test other cpu model scenarios since we should assure
that the live migration support from lower generation CPUs to higher
generation CPUs. :(


> > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> printed log as followed:
> >
> > KVM: entry failed, hardware error 0x8021
> >
> > If you're running a guest on an Intel machine without unrestricted mode
> > support, the failure can be most likely due to the guest entering an invalid
> > state for Intel VT. For example, the guest maybe running in big real mode
> > which is not supported on less recent Intel processors.
> >
> > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=
> > ESI=01a62000 EDI= EBP= ESP=01718b20
> > EIP=0185d982 EFL=0286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =   9300
> > CS =f000   9b00
> > SS =   9300
> > DS =   9300
> > FS =   9300
> > GS =   9300
> > LDT=   8200
> > TR =   8b00
> > GDT=  
> > IDT=  
> > CR0=6010 CR2= CR3= CR4=
> > DR0= DR1= DR2=
> DR3=
> > DR6=0ff0 DR7=0400
> > EFER=
> > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >
> > Problem happened when kvm_put_sregs returns err -22(called by
> kvm_arch_put_registers(qemu)).
> >
> > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > We should cancel migration if kvm_arch_put_registers returns error.
> 
> Do you have a backtrace of when the kvm_arch_put_registers is called
> when it fails?

The main backtrace is below:

 qemu_loadvm_state
 cpu_synchronize_all_post_init--> w/o return value
 cpu_synchronize_post_init   --> w/o return value
 kvm_cpu_synchronize_post_init  --> w/o return value
run_on_cpu  ---> w/o return value
   do_kvm_cpu_synchronize_post_init  --> w/o 
return value
  kvm_arch_put_registers  --> w/ return value

Root cause is some functions don't have return values, the migration thread
can't detect those failures. Paolo?

> If it's called during the loading of the device state then we should be
> able to detect it and fail the migration; however if it's only failing
> after the CPU is restarted after the migration then it's a bit too late.
> 
Actually the CPUs haven't started in this scenario.

Thanks,
-Gonglei



[Qemu-devel] About live migration rollback

2018-12-18 Thread Gonglei (Arei)
Hi Dave,

We discussed some live migration fallback scenarios in this year's KVM forum, 
and now I can provide another scenario, perhaps the upstream should consider 
rolling
back for this situation.

Environments information:

host A: cpu E5620(model WestmereEP without flag xsave)
host B: cpu E5-2643(model SandyBridgeEP with flag xsave)

The reproduce steps is :
1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
2. Migrate the vm to host B when cr4.OSXSAVE=0.
3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
4. Then migrate the vm to host A successfully, but vm was paused, and qemu 
printed log as followed:

KVM: entry failed, hardware error 0x8021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=
ESI=01a62000 EDI= EBP= ESP=01718b20
EIP=0185d982 EFL=0286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000   9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Problem happened when kvm_put_sregs returns err -22(called by 
kvm_arch_put_registers(qemu)).

Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that 
guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
We should cancel migration if kvm_arch_put_registers returns error.

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH 3/5] Add migration functions for VFIO devices

2018-12-17 Thread Gonglei (Arei)
Hi,

It's great to see this patch series, which is a very important step, although 
currently only consider GPU mdev devices to support hot migration. 

However, this is based on the VFIO framework after all, so we expect 
that we can make this live migration framework more general.

For example, the vfio_save_pending() callback is used to obtain device
memory (such as GPU memory), but if the device (such as network card) 
has no special proprietary memory, but only system memory? 
It is too much to perform a null operation for this kind of device by writing
memory to the vendor driver of kernel space. 

I think we can acquire the capability from the vendor driver before using this. 
If there is device memory that needs iterative copying, the vendor driver return
ture, otherwise return false. Then QEMU implement the specific logic, 
otherwise return directly. Just like getting the capability list of KVM
module, can we?


Regards,
-Gonglei


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Kirti Wankhede
> Sent: Wednesday, November 21, 2018 4:40 AM
> To: alex.william...@redhat.com; c...@nvidia.com
> Cc: zhengxiao...@alibaba-inc.com; kevin.t...@intel.com; yi.l@intel.com;
> eskul...@redhat.com; ziye.y...@intel.com; qemu-devel@nongnu.org;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; Kirti Wankhede ;
> eau...@redhat.com; fel...@nutanix.com; jonathan.dav...@nutanix.com;
> changpeng@intel.com; ken@amd.com
> Subject: [Qemu-devel] [PATCH 3/5] Add migration functions for VFIO devices
> 
> - Migration function are implemented for VFIO_DEVICE_TYPE_PCI device.
> - Added SaveVMHandlers and implemented all basic functions required for live
>   migration.
> - Added VM state change handler to know running or stopped state of VM.
> - Added migration state change notifier to get notification on migration state
>   change. This state is translated to VFIO device state and conveyed to vendor
>   driver.
> - VFIO device supportd migration or not is decided based of migration region
>   query. If migration region query is successful then migration is supported
>   else migration is blocked.
> - Structure vfio_device_migration_info is mapped at 0th offset of migration
>   region and should always trapped by VFIO device's driver. Added both type of
>   access support, trapped or mmapped, for data section of the region.
> - To save device state, read data offset and size using structure
>   vfio_device_migration_info.data, accordingly copy data from the region.
> - To restore device state, write data offset and size in the structure and 
> write
>   data in the region.
> - To get dirty page bitmap, write start address and pfn count then read count 
> of
>   pfns copied and accordingly read those from the rest of the region or
> mmaped
>   part of the region. This copy is iterated till page bitmap for all requested
>   pfns are copied.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/Makefile.objs |   2 +-
>  hw/vfio/migration.c   | 729
> ++
>  include/hw/vfio/vfio-common.h |  23 ++
>  3 files changed, 753 insertions(+), 1 deletion(-)
>  create mode 100644 hw/vfio/migration.c
> 
[skip]

> +
> +static SaveVMHandlers savevm_vfio_handlers = {
> +.save_setup = vfio_save_setup,
> +.save_live_iterate = vfio_save_iterate,
> +.save_live_complete_precopy = vfio_save_complete_precopy,
> +.save_live_pending = vfio_save_pending,
> +.save_cleanup = vfio_save_cleanup,
> +.load_state = vfio_load_state,
> +.load_setup = vfio_load_setup,
> +.load_cleanup = vfio_load_cleanup,
> +.is_active_iterate = vfio_is_active_iterate,
> +};
> +

 



Re: [Qemu-devel] [PATCH v3 00/16] Virtio devices split from virtio-pci

2018-12-14 Thread Gonglei (Arei)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, December 14, 2018 8:53 PM
> To: Gonglei (Arei) 
> Cc: Juan Quintela ; qemu-devel@nongnu.org; Thomas
> Huth ; Gerd Hoffmann 
> Subject: Re: [PATCH v3 00/16] Virtio devices split from virtio-pci
> 
> On Fri, Dec 14, 2018 at 07:07:44AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Juan Quintela [mailto:quint...@redhat.com]
> > > Sent: Friday, December 14, 2018 5:01 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: Michael S. Tsirkin ; Thomas Huth
> ;
> > > Gerd Hoffmann ; Gonglei (Arei)
> > > ; Juan Quintela 
> > > Subject: [PATCH v3 00/16] Virtio devices split from virtio-pci
> > >
> > > Hi
> > >
> > > v3:
> > > - rebase to master
> > > - only compile them if CONFIG_PCI is set (thomas)
> > >
> > > Please review.
> > >
> > > Later, Juan.
> > >
> > > V2:
> > >
> > > - Rebase on top of master
> > >
> > > Please review.
> > >
> > > Later, Juan.
> > >
> > > [v1]
> > > From previous verision (in the middle of make check tests):
> > > - split also the bits of virtio-pci.h (mst suggestion)
> > > - add gpu, crypt and gpg bits
> > > - more cleanups
> > > - fix all the copyrights (the ones not changed have been there
> > >   foverever)
> > > - be consistent with naming, vhost-* or virtio-*
> > >
> > > Please review, Juan.
> > >
> > > Juan Quintela (16):
> > >   virtio: split vhost vsock bits from virtio-pci
> > >   virtio: split virtio input host bits from virtio-pci
> > >   virtio: split virtio input bits from virtio-pci
> > >   virtio: split virtio rng bits from virtio-pci
> > >   virtio: split virtio balloon bits from virtio-pci
> > >   virtio: split virtio 9p bits from virtio-pci
> > >   virtio: split vhost user blk bits from virtio-pci
> > >   virtio: split vhost user scsi bits from virtio-pci
> > >   virtio: split vhost scsi bits from virtio-pci
> > >   virtio: split virtio scsi bits from virtio-pci
> > >   virtio: split virtio blk bits rom virtio-pci
> > >   virtio: split virtio net bits rom virtio-pci
> > >   virtio: split virtio serial bits rom virtio-pci
> > >   virtio: split virtio gpu bits rom virtio-pci.h
> > >   virtio: split virtio crypto bits rom virtio-pci.h
> > >   virtio: virtio 9p really requires CONFIG_VIRTFS to work
> > >
> > >  default-configs/virtio.mak|   3 +-
> > >  hw/display/virtio-gpu-pci.c   |  14 +
> > >  hw/display/virtio-vga.c   |   1 +
> > >  hw/virtio/Makefile.objs   |  15 +
> > >  hw/virtio/vhost-scsi-pci.c|  95 
> > >  hw/virtio/vhost-user-blk-pci.c| 101 
> > >  hw/virtio/vhost-user-scsi-pci.c   | 101 
> > >  hw/virtio/vhost-vsock-pci.c   |  82 
> > >  hw/virtio/virtio-9p-pci.c |  86 
> > >  hw/virtio/virtio-balloon-pci.c|  94 
> > >  hw/virtio/virtio-blk-pci.c|  97 
> > >  hw/virtio/virtio-crypto-pci.c |  14 +
> > >  hw/virtio/virtio-input-host-pci.c |  45 ++
> > >  hw/virtio/virtio-input-pci.c  | 154 ++
> > >  hw/virtio/virtio-net-pci.c|  96 
> > >  hw/virtio/virtio-pci.c| 783 --
> > >  hw/virtio/virtio-pci.h| 234 -
> > >  hw/virtio/virtio-rng-pci.c|  86 
> > >  hw/virtio/virtio-scsi-pci.c   | 106 
> > >  hw/virtio/virtio-serial-pci.c | 112 +
> > >  tests/Makefile.include|  20 +-
> > >  21 files changed, 1311 insertions(+), 1028 deletions(-)
> > >  create mode 100644 hw/virtio/vhost-scsi-pci.c
> > >  create mode 100644 hw/virtio/vhost-user-blk-pci.c
> > >  create mode 100644 hw/virtio/vhost-user-scsi-pci.c
> > >  create mode 100644 hw/virtio/vhost-vsock-pci.c
> > >  create mode 100644 hw/virtio/virtio-9p-pci.c
> > >  create mode 100644 hw/virtio/virtio-balloon-pci.c
> > >  create mode 100644 hw/virtio/virtio-blk-pci.c
> > >  create mode 100644 hw/virtio/virtio-input-host-pci.c
> > >  create mode 100644 hw/virtio/virtio-input-pci.c
> > >  create mode 100644 hw/virtio/virtio-net-pci.c
> > >  create mode 100644 hw/virtio/virtio-rng-pci.c
> > >  create mode 100644 hw/virtio/virtio-scsi-pci.c
> > >  create mode 100644 hw/virtio/virtio-serial-pci.c
> > >
> > > --
> > > 2.19.2
> >
> > For series:
> > Reviewed-by: Gonglei 
> >
> >
> > Thanks,
> > -Gonglei
> 
> Thanks!
> Can you pls align Reviewed-by: tag at the 1st column in the future?
> Makes it easier to apply the tag.

OK, I will, thanks :)

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH v3 00/16] Virtio devices split from virtio-pci

2018-12-13 Thread Gonglei (Arei)


> -Original Message-
> From: Juan Quintela [mailto:quint...@redhat.com]
> Sent: Friday, December 14, 2018 5:01 AM
> To: qemu-devel@nongnu.org
> Cc: Michael S. Tsirkin ; Thomas Huth ;
> Gerd Hoffmann ; Gonglei (Arei)
> ; Juan Quintela 
> Subject: [PATCH v3 00/16] Virtio devices split from virtio-pci
> 
> Hi
> 
> v3:
> - rebase to master
> - only compile them if CONFIG_PCI is set (thomas)
> 
> Please review.
> 
> Later, Juan.
> 
> V2:
> 
> - Rebase on top of master
> 
> Please review.
> 
> Later, Juan.
> 
> [v1]
> From previous verision (in the middle of make check tests):
> - split also the bits of virtio-pci.h (mst suggestion)
> - add gpu, crypt and gpg bits
> - more cleanups
> - fix all the copyrights (the ones not changed have been there
>   foverever)
> - be consistent with naming, vhost-* or virtio-*
> 
> Please review, Juan.
> 
> Juan Quintela (16):
>   virtio: split vhost vsock bits from virtio-pci
>   virtio: split virtio input host bits from virtio-pci
>   virtio: split virtio input bits from virtio-pci
>   virtio: split virtio rng bits from virtio-pci
>   virtio: split virtio balloon bits from virtio-pci
>   virtio: split virtio 9p bits from virtio-pci
>   virtio: split vhost user blk bits from virtio-pci
>   virtio: split vhost user scsi bits from virtio-pci
>   virtio: split vhost scsi bits from virtio-pci
>   virtio: split virtio scsi bits from virtio-pci
>   virtio: split virtio blk bits rom virtio-pci
>   virtio: split virtio net bits rom virtio-pci
>   virtio: split virtio serial bits rom virtio-pci
>   virtio: split virtio gpu bits rom virtio-pci.h
>   virtio: split virtio crypto bits rom virtio-pci.h
>   virtio: virtio 9p really requires CONFIG_VIRTFS to work
> 
>  default-configs/virtio.mak|   3 +-
>  hw/display/virtio-gpu-pci.c   |  14 +
>  hw/display/virtio-vga.c   |   1 +
>  hw/virtio/Makefile.objs   |  15 +
>  hw/virtio/vhost-scsi-pci.c|  95 
>  hw/virtio/vhost-user-blk-pci.c| 101 
>  hw/virtio/vhost-user-scsi-pci.c   | 101 
>  hw/virtio/vhost-vsock-pci.c   |  82 
>  hw/virtio/virtio-9p-pci.c |  86 
>  hw/virtio/virtio-balloon-pci.c|  94 
>  hw/virtio/virtio-blk-pci.c|  97 
>  hw/virtio/virtio-crypto-pci.c |  14 +
>  hw/virtio/virtio-input-host-pci.c |  45 ++
>  hw/virtio/virtio-input-pci.c  | 154 ++
>  hw/virtio/virtio-net-pci.c|  96 
>  hw/virtio/virtio-pci.c| 783 --
>  hw/virtio/virtio-pci.h| 234 -
>  hw/virtio/virtio-rng-pci.c|  86 
>  hw/virtio/virtio-scsi-pci.c   | 106 
>  hw/virtio/virtio-serial-pci.c | 112 +
>  tests/Makefile.include|  20 +-
>  21 files changed, 1311 insertions(+), 1028 deletions(-)
>  create mode 100644 hw/virtio/vhost-scsi-pci.c
>  create mode 100644 hw/virtio/vhost-user-blk-pci.c
>  create mode 100644 hw/virtio/vhost-user-scsi-pci.c
>  create mode 100644 hw/virtio/vhost-vsock-pci.c
>  create mode 100644 hw/virtio/virtio-9p-pci.c
>  create mode 100644 hw/virtio/virtio-balloon-pci.c
>  create mode 100644 hw/virtio/virtio-blk-pci.c
>  create mode 100644 hw/virtio/virtio-input-host-pci.c
>  create mode 100644 hw/virtio/virtio-input-pci.c
>  create mode 100644 hw/virtio/virtio-net-pci.c
>  create mode 100644 hw/virtio/virtio-rng-pci.c
>  create mode 100644 hw/virtio/virtio-scsi-pci.c
>  create mode 100644 hw/virtio/virtio-serial-pci.c
> 
> --
> 2.19.2

For series:
Reviewed-by: Gonglei 

 
Thanks,
-Gonglei



Re: [Qemu-devel] [RFC PATCH v1 1/4] VFIO KABI for migration interface

2018-10-17 Thread Gonglei (Arei)

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Kirti Wankhede
> Sent: Thursday, October 18, 2018 4:47 AM
> To: Alex Williamson 
> Cc: c...@nvidia.com; qemu-devel@nongnu.org; k...@vger.kernel.org; Yulei
> Zhang ; Dr. David Alan Gilbert
> ; Juan Quintela ; Wang, Zhi A
> 
> Subject: Re: [RFC PATCH v1 1/4] VFIO KABI for migration interface
> 
> 
> On 10/17/2018 4:04 AM, Alex Williamson wrote:
> > On Tue, 16 Oct 2018 23:42:35 +0530
> > Kirti Wankhede  wrote:
> >
> >> - Added vfio_device_migration_info structure to use interact with vendor
> >>   driver.
> >> - Different flags are used to get or set migration related information
> >>   from/to vendor driver.
> >> Flag VFIO_MIGRATION_PROBE: To query if feature is supported
> >> Flag VFIO_MIGRATION_GET_REGION: To get migration region info
> >> Flag VFIO_MIGRATION_SET_STATE: To convey device state in vendor driver
> >> Flag VFIO_MIGRATION_GET_PENDING: To get pending bytes yet to be
> migrated
> >>   from vendor driver
> >> Flag VFIO_MIGRATION_GET_BUFFER: On this flag, vendor driver should
> write
> >>   data to migration region and return number of bytes written in the
> region
> >> Flag VFIO_MIGRATION_SET_BUFFER: In migration resume path, user space
> app
> >>   writes to migration region and communicates it to vendor driver with
> >>   this ioctl with this flag.
> >> Flag VFIO_MIGRATION_GET_DIRTY_PFNS: Get bitmap of dirty pages from
> vendor
> >>   driver from given start address
> >>
> >> - Added enum for possible device states.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  linux-headers/linux/vfio.h | 91
> ++
> >>  1 file changed, 91 insertions(+)
> >>
> >> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> >> index 3615a269d378..8e9045ed9aa8 100644
> >> --- a/linux-headers/linux/vfio.h
> >> +++ b/linux-headers/linux/vfio.h
> >> @@ -602,6 +602,97 @@ struct vfio_device_ioeventfd {
> >>
> >>  #define VFIO_DEVICE_IOEVENTFD _IO(VFIO_TYPE, VFIO_BASE +
> 16)
> >>
> >> +/**
> >> + * VFIO_DEVICE_MIGRATION_INFO - _IOW(VFIO_TYPE, VFIO_BASE + 17,
> >> + *  struct vfio_device_migration_info)
> >
> > This is quite a bit more than an "INFO" ioctl.
> >
> >> + * Flag VFIO_MIGRATION_PROBE:
> >> + *  To query if feature is supported
> >> + *
> >> + * Flag VFIO_MIGRATION_GET_REGION:
> >> + *  To get migration region info
> >> + *  region_index [output] : region index to be used for migration
> region
> >> + *  size [output]: size of migration region
> >
> > Of course the region migration region can describe itself as being used
> > for migration, so this is unnecessary.  The presence of that region
> > could also negate the need for a probe.
> >
> 
> Yes, that can be done.
> 
> 
> >> + *
> >> + * Flag VFIO_MIGRATION_SET_STATE:
> >> + *  To set device state in vendor driver
> >> + *  device_state [input] : User space app sends device state to
> vendor
> >> + *   driver on state change
> >
> > Valid states are the enum defined below, correct?
> >
> 
> Yes, that's correct.
> 
> > Does setting STOPNCOPY_ACTIVE stop any state change of the device or is
> > that expected to happen through other means?
> >
> 
> _PRECOPY_ACTIVE means vCPUs are still running, so VFIO device should
> still remain active.
> _STOPNCOPY_ACTIVE means vCPUs are not running and device should also be
> stopped and copy device's state.
> 
> > What are the allowable state transitions?
> >
> 
> Normal VM running case:
> _NONE -> _RUNNING
> 
> In case of live migration, at source:
> _RUNNING -> _SETUP -> _PRECOPY_ACTIVE -> _STOPNCOPY_ACTIVE ->
> _SAVE_COMPLETED
> 
> at destination:
> _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING
> 
> In save VM case:
> _RUNNING -> _SETUP -> _STOPNCOPY_ACTIVE -> _SAVE_COMPLETED
> 
> In case of resuming VM from saved state:
> _NONE -> _SETUP -> _RESUME -> _RESUME_COMPLETE -> _RUNNING
> 
> _FAILED or _CANCELLED can happen in any state.
> 
> > How many bits in flags is a user allowed to set at once?
> >
> 
> One bit at a time. Probably, I should use enum for flags rather than bits.
> 
> >> + * Flag VFIO_MIGRATION_GET_PENDING:
> >> + *  To get pending bytes yet to be migrated from vendor driver
> >> + *  threshold_size [Input] : threshold of buffer in User space app.
> >> + *  pending_precopy_only [output] : pending data which must be
> migrated in
> >> + *  precopy phase or in stopped state, in other words - before
> target
> >> + *  vm start
> >> + *  pending_compatible [output] : pending data which may be
> migrated in any
> >> + *   phase
> >> + *  pending_postcopy_only [output] : pending data which must be
> migrated in
> >> + *   postcopy phase or in stopped state, in other words - after
> source
> >> + *   vm stop
> >> + *  Sum of pending_precopy_only, 

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto device specification

2018-08-28 Thread Gonglei (Arei)
> 
> On Tue, Aug 28, 2018 at 03:31:02AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Friday, August 24, 2018 8:54 PM
> > >
> > > On Fri, Aug 24, 2018 at 12:07:44PM +, Gonglei (Arei) wrote:
> > > > Hi Michael,
> > > >
> > > > > -Original Message-
> > > > > From: virtio-...@lists.oasis-open.org
> > > [mailto:virtio-...@lists.oasis-open.org]
> > > > > On Behalf Of Michael S. Tsirkin
> > > > > Sent: Friday, August 24, 2018 7:23 PM
> > > > > To: longpeng 
> > > > > Cc: xin.z...@intel.com; Gonglei (Arei) ;
> > > > > pa...@linux.vnet.ibm.com; qemu-devel@nongnu.org;
> > > > > virtio-...@lists.oasis-open.org; coh...@redhat.com;
> > > stefa...@redhat.com;
> > > > > denglin...@chinamobile.com; Jani Kokkonen
> > > ;
> > > > > ola.liljed...@arm.com; varun.se...@freescale.com;
> > > > > brian.a.keat...@intel.com; liang.j...@intel.com;
> john.grif...@intel.com;
> > > > > ag...@suse.de; jasow...@redhat.com; vincent.jar...@6wind.com;
> > > > > Huangweidong (C) ; wangxin (U)
> > > > > ; Zhoujian (jay)
> > > 
> > > > > Subject: [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto
> device
> > > > > specification
> > > > >
> > > > > Is there a github issue? If not pls create one.
> > > > >
> > > >
> > > > I just created one issue:
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/issues/19
> > >
> > > All set to start voting whenever you request it.
> > >
> >
> > Hi Michael,
> >
> > Since no comments currently, pls help to start a ballot for virtio crypto 
> > spec if
> you can. :)
> >
> >
> > Thanks,
> > -Gonglei
> 
> Done. In the future please add a link to mailing list archives.
> 

Sure. Ballot created at URL: 
https://www.oasis-open.org/committees/ballot.php?id=3242


Thanks,
-Gonglei



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto device specification

2018-08-27 Thread Gonglei (Arei)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, August 24, 2018 8:54 PM
> 
> On Fri, Aug 24, 2018 at 12:07:44PM +, Gonglei (Arei) wrote:
> > Hi Michael,
> >
> > > -Original Message-
> > > From: virtio-...@lists.oasis-open.org
> [mailto:virtio-...@lists.oasis-open.org]
> > > On Behalf Of Michael S. Tsirkin
> > > Sent: Friday, August 24, 2018 7:23 PM
> > > To: longpeng 
> > > Cc: xin.z...@intel.com; Gonglei (Arei) ;
> > > pa...@linux.vnet.ibm.com; qemu-devel@nongnu.org;
> > > virtio-...@lists.oasis-open.org; coh...@redhat.com;
> stefa...@redhat.com;
> > > denglin...@chinamobile.com; Jani Kokkonen
> ;
> > > ola.liljed...@arm.com; varun.se...@freescale.com;
> > > brian.a.keat...@intel.com; liang.j...@intel.com; john.grif...@intel.com;
> > > ag...@suse.de; jasow...@redhat.com; vincent.jar...@6wind.com;
> > > Huangweidong (C) ; wangxin (U)
> > > ; Zhoujian (jay)
> 
> > > Subject: [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto 
> > > device
> > > specification
> > >
> > > Is there a github issue? If not pls create one.
> > >
> >
> > I just created one issue:
> >
> > https://github.com/oasis-tcs/virtio-spec/issues/19
> 
> All set to start voting whenever you request it.
> 

Hi Michael,

Since no comments currently, pls help to start a ballot for virtio crypto spec 
if you can. :)


Thanks,
-Gonglei



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto device specification

2018-08-24 Thread Gonglei (Arei)
Hi Michael,

> -Original Message-
> From: virtio-...@lists.oasis-open.org [mailto:virtio-...@lists.oasis-open.org]
> On Behalf Of Michael S. Tsirkin
> Sent: Friday, August 24, 2018 7:23 PM
> To: longpeng 
> Cc: xin.z...@intel.com; Gonglei (Arei) ;
> pa...@linux.vnet.ibm.com; qemu-devel@nongnu.org;
> virtio-...@lists.oasis-open.org; coh...@redhat.com; stefa...@redhat.com;
> denglin...@chinamobile.com; Jani Kokkonen ;
> ola.liljed...@arm.com; varun.se...@freescale.com;
> brian.a.keat...@intel.com; liang.j...@intel.com; john.grif...@intel.com;
> ag...@suse.de; jasow...@redhat.com; vincent.jar...@6wind.com;
> Huangweidong (C) ; wangxin (U)
> ; Zhoujian (jay) 
> Subject: [virtio-dev] Re: [PATCH v25 0/2] virtio-crypto: virtio crypto device
> specification
> 
> Is there a github issue? If not pls create one.
> 

I just created one issue:

https://github.com/oasis-tcs/virtio-spec/issues/19


Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH] cryptodev: remove dead code

2018-07-30 Thread Gonglei (Arei)


> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Monday, July 30, 2018 6:49 PM
> To: Paolo Bonzini 
> Cc: QEMU Developers ; Gonglei (Arei)
> 
> Subject: Re: [Qemu-devel] [PATCH] cryptodev: remove dead code
> 
> On 30 July 2018 at 09:51, Paolo Bonzini  wrote:
> > Reported by Coverity as CID 1390600.
> >
> > Signed-off-by: Paolo Bonzini 
> > ---
> 
> This already has a reviewed patch on-list for this from
> back in April:
> 
> https://patchwork.ozlabs.org/patch/906041/
> 
> so I think we should just apply that.
> 
Oh, yes. Would you pick it up directly? Or by qemu-trivial?

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH] cryptodev: remove dead code

2018-07-30 Thread Gonglei (Arei)


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Paolo Bonzini
> Sent: Monday, July 30, 2018 4:51 PM
> To: qemu-devel@nongnu.org
> Subject: [Qemu-devel] [PATCH] cryptodev: remove dead code
> 
> Reported by Coverity as CID 1390600.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  backends/cryptodev-vhost-user.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
> index d52daccfcd..d539f14d59 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -157,7 +157,6 @@ static void cryptodev_vhost_user_event(void *opaque,
> int event)
>  {
>  CryptoDevBackendVhostUser *s = opaque;
>  CryptoDevBackend *b = CRYPTODEV_BACKEND(s);
> -Error *err = NULL;
>  int queues = b->conf.peers.queues;
> 
>  assert(queues < MAX_CRYPTO_QUEUE_NUM);
> @@ -174,10 +173,6 @@ static void cryptodev_vhost_user_event(void
> *opaque, int event)
>  cryptodev_vhost_user_stop(queues, s);
>  break;
>  }
> -
> -if (err) {
> -error_report_err(err);
> -}
>  }
> 
>  static void cryptodev_vhost_user_init(
> --
> 2.17.1
> 

Reviewed-by: Gonglei 

Thanks,
-Gonglei



[Qemu-devel] about live memory snapshot

2018-06-29 Thread Gonglei (Arei)
Hi Peter,

As we discussed in LC3 China, the current scheme of "migration to file" 
can't fit on production environment, which will cause the snapshot file bigger 
and bigger when the guest is under enough memory pressure. We can't
assume what size the snapshot file is.
 
Pls have a look if we have a simple method to resolve the problem. :)

PS: the below link is zhanghailiang's scheme based on userfaultfd.

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00664.html


Thanks,
-Gonglei
 



Re: [Qemu-devel] [RFC v1 1/1] virtio-crypto: Allow disabling of cipher algorithms for virtio-crypto device

2018-06-14 Thread Gonglei (Arei)


> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Thursday, June 14, 2018 11:11 PM
> To: Farhan Ali 
> Cc: Halil Pasic ; qemu-devel@nongnu.org;
> fran...@linux.ibm.com; m...@redhat.com; borntrae...@de.ibm.com; Gonglei
> (Arei) ; longpeng ;
> Viktor Mihajlovski ;
> mjros...@linux.vnet.ibm.com
> Subject: Re: [Qemu-devel] [RFC v1 1/1] virtio-crypto: Allow disabling of 
> cipher
> algorithms for virtio-crypto device
> 
> On Thu, Jun 14, 2018 at 10:50:40AM -0400, Farhan Ali wrote:
> >
> >
> > On 06/14/2018 04:21 AM, Daniel P. Berrangé wrote:
> > > On Wed, Jun 13, 2018 at 07:28:08PM +0200, Halil Pasic wrote:
> > > >
> > > >
> > > > On 06/13/2018 05:05 PM, Daniel P. Berrangé wrote:
> > > > > On Wed, Jun 13, 2018 at 11:01:05AM -0400, Farhan Ali wrote:
> > > > > > Hi Daniel
> > > > > >
> > > > > > On 06/13/2018 05:37 AM, Daniel P. Berrangé wrote:
> > > > > > > On Tue, Jun 12, 2018 at 03:48:34PM -0400, Farhan Ali wrote:
> > > > > > > > The virtio-crypto driver currently propagates to the guest
> > > > > > > > all the cipher algorithms that the backend cryptodev can
> > > > > > > > support. But in certain cases where the guest has more
> > > > > > > > performant mechanism to handle some algorithms, it would be
> > > > > > > > useful to propagate only a subset of the algorithms.
> > > > > > >
> > > > > > > I'm not really convinced by this.
> > > > > > >
> > > > > > > The performance of crypto algorithms has many influencing
> > > > > > > factors, making it pretty hard to decide which is best
> > > > > > > without actively testing specific impls and comparing
> > > > > > > them in a manner which matches the application usage
> > > > > > > pattern. eg in theory the kernel crypto impl of an alg
> > > > > > > is faster than a userspace impl, if the kernel uses
> > > > > > > hardware accel and userspace does not. This, however,
> > > > > > > ignores the overhead of the kernel/userspace switch.
> > > > > > > The real world performance winner, thus depends on the
> > > > > > > amount of data being processed in each operation. Some
> > > > > > > times userspace can win & sometimes kernel space can
> > > > > > > win. This is even more relevant to virtio-crypto as
> > > > > > > it has more expensive context switches.
> > > > > >
> > > > > > True. But what if the guest can perform some crypto algorithms
> without a
> > > > > > incurring a VM exit? For example in s390 we have the cpacf
> instructions to
> > > > > > perform crypto and this instruction is implemented for us by our
> hardware
> > > > > > virtualization technology. In such a case it would be better not to 
> > > > > > use
> > > > > > virtio-crypto's implementation of such a crypto algorithm.
> > > > > >
> > > > > > At the same time we would like to take advantage of virtio-crypto's
> > > > > > acceleration capabilities for certain crypto algorithms for which 
> > > > > > there
> is
> > > > > > no hardware assistance.
> > > > >
> > > > > IIUC, the kernel's crypto layer can support multiple implementations 
> > > > > of
> > > > > any algorithm. Providers can report a priority against implementations
> > > > > which influences which impl is used in practice. So if there's a 
> > > > > native
> > > > > instruction for a partiuclar algorithm I would expect the impl 
> > > > > registered
> > > > > for that to be designated higher priority than other impls, so that 
> > > > > it is
> > > > > used in preference to other impls.
> > > > >
> > > >
> > > > AFAIR the problem here is that in (the guest) kernel the virtio-crypto
> > > > driver has to register it's crypto algo implementations with a priority
> > > > (single number), which dictates if it's going to be the preferred (used)
> > > > implementation of the algorithm or not. The virtio-crypto driver does 
> > > > this
> > > > without having information

Re: [Qemu-devel] [RFC v1 1/1] virtio-crypto: Allow disabling of cipher algorithms for virtio-crypto device

2018-06-12 Thread Gonglei (Arei)


> -Original Message-
> From: Farhan Ali [mailto:al...@linux.ibm.com]
> Sent: Wednesday, June 13, 2018 3:49 AM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; Gonglei (Arei) ; longpeng
> ; pa...@linux.ibm.com; borntrae...@de.ibm.com;
> fran...@linux.ibm.com; al...@linux.ibm.com
> Subject: [RFC v1 1/1] virtio-crypto: Allow disabling of cipher algorithms for
> virtio-crypto device
> 
> The virtio-crypto driver currently propagates to the guest
> all the cipher algorithms that the backend cryptodev can
> support. But in certain cases where the guest has more
> performant mechanism to handle some algorithms, it would be
> useful to propagate only a subset of the algorithms.
> 

It makes sense to me. E.g. current Intel CPU has the AES-NI instruction for 
accelerating
AES algo. We don't need to propagate AES algos.

> This patch adds support for disabling the cipher
> algorithms of the backend cryptodev.
> 
> eg:
>  -object cryptodev-backend-builtin,id=cryptodev0
>  -device virtio-crypto-ccw,id=crypto0,cryptodev=cryptodev0,cipher-aes-cbc=off
> 
> Signed-off-by: Farhan Ali 
> ---
> 
> Please note this patch is not complete, and there are TODOs to handle
> for other types of algorithms such Hash, AEAD and MAC algorithms.
> 
> This is mainly intended to get some feedback on the design approach
> from the community.
> 
> 
>  hw/virtio/virtio-crypto.c | 46
> ---
>  include/hw/virtio/virtio-crypto.h |  3 +++
>  2 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index 9a9fa49..4aed9ca 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -754,12 +754,22 @@ static void virtio_crypto_reset(VirtIODevice *vdev)
>  static void virtio_crypto_init_config(VirtIODevice *vdev)
>  {
>  VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(vdev);
> +uint32_t user_crypto_services = (1u <<
> VIRTIO_CRYPTO_SERVICE_CIPHER) |
> +(1u <<
> VIRTIO_CRYPTO_SERVICE_HASH) |
> +(1u <<
> VIRTIO_CRYPTO_SERVICE_AEAD) |
> +(1u <<
> VIRTIO_CRYPTO_SERVICE_MAC);
> +
> +if (vcrypto->user_cipher_algo_l & (1u << VIRTIO_CRYPTO_NO_CIPHER)) {
> +vcrypto->user_cipher_algo_l = 1u << VIRTIO_CRYPTO_NO_CIPHER;
> +vcrypto->user_cipher_algo_h = 0;
> +user_crypto_services &= ~(1u <<
> VIRTIO_CRYPTO_SERVICE_CIPHER);
> +}
> 
> -vcrypto->conf.crypto_services =
> +vcrypto->conf.crypto_services = user_crypto_services &
>   vcrypto->conf.cryptodev->conf.crypto_services;
> -vcrypto->conf.cipher_algo_l =
> +vcrypto->conf.cipher_algo_l = vcrypto->user_cipher_algo_l &
>   vcrypto->conf.cryptodev->conf.cipher_algo_l;
> -vcrypto->conf.cipher_algo_h =
> +vcrypto->conf.cipher_algo_h = vcrypto->user_cipher_algo_h &
>   vcrypto->conf.cryptodev->conf.cipher_algo_h;
>  vcrypto->conf.hash_algo = vcrypto->conf.cryptodev->conf.hash_algo;
>  vcrypto->conf.mac_algo_l = vcrypto->conf.cryptodev->conf.mac_algo_l;
> @@ -853,6 +863,34 @@ static const VMStateDescription
> vmstate_virtio_crypto = {
>  static Property virtio_crypto_properties[] = {
>  DEFINE_PROP_LINK("cryptodev", VirtIOCrypto, conf.cryptodev,
>   TYPE_CRYPTODEV_BACKEND, CryptoDevBackend
> *),
> +DEFINE_PROP_BIT("no-cipher", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_ARC4, false),

s/ VIRTIO_CRYPTO_CIPHER_ARC4/VIRTIO_CRYPTO_NO_CIPHER/

> +DEFINE_PROP_BIT("cipher-arc4", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_ARC4, false),
> +DEFINE_PROP_BIT("cipher-aes-ecb", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_AES_ECB, false),
> +DEFINE_PROP_BIT("cipher-aes-cbc", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_AES_CBC, false),
> +DEFINE_PROP_BIT("cipher-aes-ctr", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_AES_CTR, false),
> +DEFINE_PROP_BIT("cipher-des-ecb", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_DES_ECB, false),
> +DEFINE_PROP_BIT("cipher-3des-ecb", VirtIOCrypto, user_cipher_algo_l,
> +VIRTIO_CRYPTO_CIPHER_3DES_ECB, false),
> +DEFINE_PROP_BIT("cipher-3des-cbc", VirtIOCrypto, 

Re: [Qemu-devel] [RFC v1 2/2] crypto/virtio-crypto: Register an algo only if it's supported

2018-06-12 Thread Gonglei (Arei)


> -Original Message-
> From: Farhan Ali [mailto:al...@linux.ibm.com]
> Sent: Wednesday, June 13, 2018 1:08 AM
> To: Gonglei (Arei) ; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Cc: m...@redhat.com; qemu-devel@nongnu.org; longpeng
> ; pa...@linux.ibm.com; fran...@linux.ibm.com;
> borntrae...@de.ibm.com
> Subject: Re: [RFC v1 2/2] crypto/virtio-crypto: Register an algo only if it's
> supported
> 
> 
> 
> On 06/11/2018 04:48 AM, Gonglei (Arei) wrote:
> >
> >
> >> -Original Message-
> >> From: Farhan Ali [mailto:al...@linux.ibm.com]
> >> Sent: Saturday, June 09, 2018 3:09 AM
> >> To: linux-ker...@vger.kernel.org; k...@vger.kernel.org
> >> Cc: m...@redhat.com; qemu-devel@nongnu.org; Gonglei (Arei)
> >> ; longpeng ;
> >> pa...@linux.ibm.com; fran...@linux.ibm.com; borntrae...@de.ibm.com;
> >> al...@linux.ibm.com
> >> Subject: [RFC v1 2/2] crypto/virtio-crypto: Register an algo only if it's
> supported
> >>
> >> From: Farhan Ali 
> >>
> >> Register a crypto algo with the Linux crypto layer only if
> >> the algorithm is supported by the backend virtio-crypto
> >> device.
> >>
> >> Also route crypto requests to a virtio-crypto
> >> device, only if it can support the requested service and
> >> algorithm.
> >>
> >> Signed-off-by: Farhan Ali 
> >> ---
> >>   drivers/crypto/virtio/virtio_crypto_algs.c   | 110
> >> ++-
> >>   drivers/crypto/virtio/virtio_crypto_common.h |  11 ++-
> >>   drivers/crypto/virtio/virtio_crypto_mgr.c|  81
> ++--
> >>   3 files changed, 158 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> >> b/drivers/crypto/virtio/virtio_crypto_algs.c
> >> index ba190cf..fef112a 100644
> >> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> >> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> >> @@ -49,12 +49,18 @@ struct virtio_crypto_sym_request {
> >>bool encrypt;
> >>   };
> >>
> >> +struct virtio_crypto_algo {
> >> +  uint32_t algonum;
> >> +  uint32_t service;
> >> +  unsigned int active_devs;
> >> +  struct crypto_alg algo;
> >> +};
> >> +
> >>   /*
> >>* The algs_lock protects the below global virtio_crypto_active_devs
> >>* and crypto algorithms registion.
> >>*/
> >>   static DEFINE_MUTEX(algs_lock);
> >> -static unsigned int virtio_crypto_active_devs;
> >>   static void virtio_crypto_ablkcipher_finalize_req(
> >>struct virtio_crypto_sym_request *vc_sym_req,
> >>struct ablkcipher_request *req,
> >> @@ -312,13 +318,19 @@ static int virtio_crypto_ablkcipher_setkey(struct
> >> crypto_ablkcipher *tfm,
> >> unsigned int keylen)
> >>   {
> >>struct virtio_crypto_ablkcipher_ctx *ctx =
> crypto_ablkcipher_ctx(tfm);
> >> +  uint32_t alg;
> >>int ret;
> >>
> >> +  ret = virtio_crypto_alg_validate_key(keylen, );
> >> +  if (ret)
> >> +  return ret;
> >> +
> >>if (!ctx->vcrypto) {
> >>/* New key */
> >>int node = virtio_crypto_get_current_node();
> >>struct virtio_crypto *vcrypto =
> >> -virtcrypto_get_dev_node(node);
> >> +virtcrypto_get_dev_node(node,
> >> +VIRTIO_CRYPTO_SERVICE_CIPHER, alg);
> >>if (!vcrypto) {
> >>pr_err("virtio_crypto: Could not find a virtio device 
> >> in the
> >> system\n");
> >
> > We'd better change the above error message now. What about:
> >   " virtio_crypto: Could not find a virtio device in the system or 
> > unsupported
> algo" ?
> >
> > Regards,
> > -Gonglei
> 
> 
> Sure, I will update the error message. But other than that does the rest
> of the code looks good to you?
> 
Yes, good work. You can add my ack in v2:

Acked-by: Gonglei 

Regards,
-Gonglei





Re: [Qemu-devel] [RFC v1 1/2] crypto/virtio-crypto: Read crypto services and algorithm masks

2018-06-12 Thread Gonglei (Arei)

> -Original Message-
> From: Farhan Ali [mailto:al...@linux.ibm.com]
> Sent: Wednesday, June 13, 2018 1:07 AM
> To: Gonglei (Arei) ; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Cc: m...@redhat.com; qemu-devel@nongnu.org; longpeng
> ; pa...@linux.ibm.com; fran...@linux.ibm.com;
> borntrae...@de.ibm.com
> Subject: Re: [RFC v1 1/2] crypto/virtio-crypto: Read crypto services and
> algorithm masks
> 
> Hi Arei
> 
> On 06/11/2018 02:43 AM, Gonglei (Arei) wrote:
> >
> >> -Original Message-
> >> From: Farhan Ali [mailto:al...@linux.ibm.com]
> >> Sent: Saturday, June 09, 2018 3:09 AM
> >> To: linux-ker...@vger.kernel.org; k...@vger.kernel.org
> >> Cc: m...@redhat.com; qemu-devel@nongnu.org; Gonglei (Arei)
> >> ; longpeng ;
> >> pa...@linux.ibm.com; fran...@linux.ibm.com; borntrae...@de.ibm.com;
> >> al...@linux.ibm.com
> >> Subject: [RFC v1 1/2] crypto/virtio-crypto: Read crypto services and
> algorithm
> >> masks
> >>
> >> Read the crypto services and algorithm masks which provides
> >> information about the services and algorithms supported by
> >> virtio-crypto backend.
> >>
> >> Signed-off-by: Farhan Ali 
> >> ---
> >>   drivers/crypto/virtio/virtio_crypto_common.h | 14 ++
> >>   drivers/crypto/virtio/virtio_crypto_core.c   | 29
> >> 
> >>   2 files changed, 43 insertions(+)
> >>
> >> diff --git a/drivers/crypto/virtio/virtio_crypto_common.h
> >> b/drivers/crypto/virtio/virtio_crypto_common.h
> >> index 66501a5..05eca12e 100644
> >> --- a/drivers/crypto/virtio/virtio_crypto_common.h
> >> +++ b/drivers/crypto/virtio/virtio_crypto_common.h
> >> @@ -55,6 +55,20 @@ struct virtio_crypto {
> >>/* Number of queue currently used by the driver */
> >>u32 curr_queue;
> >>
> >> +  /*
> >> +   * Specifies the services mask which the device support,
> >> +   * see VIRTIO_CRYPTO_SERVICE_* above
> >> +   */
> >
> > Pls update the above comments. Except that:
> >
> > Acked-by: Gonglei 
> >
> 
> Sure will update the comment. How about " Specifies the services mask
> which the device support, * see VIRTIO_CRYPTO_SERVICE_*" ?
> 
It makes sense IMHO :)

Regards,
-Gonglei

> or should I specify the file where the VIRTIO_CRYPTO_SERVICE_* are defined?
> 
> Thanks
> Farhan
> 
> >> +  u32 crypto_services;
> >> +
> >> +  /* Detailed algorithms mask */
> >> +  u32 cipher_algo_l;
> >> +  u32 cipher_algo_h;
> >> +  u32 hash_algo;
> >> +  u32 mac_algo_l;
> >> +  u32 mac_algo_h;
> >> +  u32 aead_algo;
> >> +
> >>/* Maximum length of cipher key */
> >>u32 max_cipher_key_len;
> >>/* Maximum length of authenticated key */
> >> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c
> >> b/drivers/crypto/virtio/virtio_crypto_core.c
> >> index 8332698..8f745f2 100644
> >> --- a/drivers/crypto/virtio/virtio_crypto_core.c
> >> +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> >> @@ -303,6 +303,13 @@ static int virtcrypto_probe(struct virtio_device
> *vdev)
> >>u32 max_data_queues = 0, max_cipher_key_len = 0;
> >>u32 max_auth_key_len = 0;
> >>u64 max_size = 0;
> >> +  u32 cipher_algo_l = 0;
> >> +  u32 cipher_algo_h = 0;
> >> +  u32 hash_algo = 0;
> >> +  u32 mac_algo_l = 0;
> >> +  u32 mac_algo_h = 0;
> >> +  u32 aead_algo = 0;
> >> +  u32 crypto_services = 0;
> >>
> >>if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> >>return -ENODEV;
> >> @@ -339,6 +346,20 @@ static int virtcrypto_probe(struct virtio_device
> *vdev)
> >>max_auth_key_len, _auth_key_len);
> >>virtio_cread(vdev, struct virtio_crypto_config,
> >>max_size, _size);
> >> +  virtio_cread(vdev, struct virtio_crypto_config,
> >> +  crypto_services, _services);
> >> +  virtio_cread(vdev, struct virtio_crypto_config,
> >> +  cipher_algo_l, _algo_l);
> >> +  virtio_cread(vdev, struct virtio_crypto_config,
> >> +  cipher_algo_h, _algo_h);
> >> +  virtio_cread(vdev, struct virtio_crypto_config,
> >> +  hash_algo, _algo);
> >> +  virtio_cread(vdev, struct virtio_crypto_config,
> >> +  mac_algo_l, _algo_l);
> >> +  virtio_cread(vdev, struct virtio_cryp

Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread Gonglei (Arei)

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of David Hildenbrand
> Sent: Monday, June 11, 2018 8:36 PM
> To: Gonglei (Arei) ; 浙大邮箱 
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 11.06.2018 14:25, Gonglei (Arei) wrote:
> >
> > Hi David and Paolo,
> >
> >> -Original Message-
> >> From: David Hildenbrand [mailto:da...@redhat.com]
> >> Sent: Monday, June 11, 2018 6:44 PM
> >> To: 浙大邮箱 
> >> Cc: Paolo Bonzini ; Gonglei (Arei)
> >> ; Igor Mammedov ;
> >> xuyandong ; Zhanghailiang
> >> ; wangxin (U)
> >> ; lidonglin ;
> >> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
> >> 
> >> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately 
> >> after
> the
> >> VM start
> >>
> >> On 07.06.2018 18:03, 浙大邮箱 wrote:
> >>> Hi,all
> >>> I still have a question after reading your discussion: Will seabios 
> >>> detect the
> >> change of address space even if we add_region and del_region
> automatically? I
> >> guess that seabios may not take this change into consideration.
> >>
> >> Hi,
> >>
> >> We would just change the way how KVM memory slots are updated. This is
> >> right now not atomic, but would be later on. It should not have any
> >> other effect.
> >>
> > Yes. Do you have any plans to do that?
> 
> Well, I have plans to work on atomically resizable memory regions
> (atomic del + add), and what Paolo described could also work for that
> use case. However, I won't have time to look into that in the near
> future. So if somebody else wants to jump it, perfect. If not, it will
> have to wait unfortunately.
> 
Got it. :)

Thanks,
-Gonglei


Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread Gonglei (Arei)

Hi David and Paolo,

> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: Monday, June 11, 2018 6:44 PM
> To: 浙大邮箱 
> Cc: Paolo Bonzini ; Gonglei (Arei)
> ; Igor Mammedov ;
> xuyandong ; Zhanghailiang
> ; wangxin (U)
> ; lidonglin ;
> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
> 
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 07.06.2018 18:03, 浙大邮箱 wrote:
> > Hi,all
> > I still have a question after reading your discussion: Will seabios detect 
> > the
> change of address space even if we add_region and del_region automatically? I
> guess that seabios may not take this change into consideration.
> 
> Hi,
> 
> We would just change the way how KVM memory slots are updated. This is
> right now not atomic, but would be later on. It should not have any
> other effect.
> 
Yes. Do you have any plans to do that? 

Thanks,
-Gonglei


Re: [Qemu-devel] [RFC v1 2/2] crypto/virtio-crypto: Register an algo only if it's supported

2018-06-11 Thread Gonglei (Arei)



> -Original Message-
> From: Farhan Ali [mailto:al...@linux.ibm.com]
> Sent: Saturday, June 09, 2018 3:09 AM
> To: linux-ker...@vger.kernel.org; k...@vger.kernel.org
> Cc: m...@redhat.com; qemu-devel@nongnu.org; Gonglei (Arei)
> ; longpeng ;
> pa...@linux.ibm.com; fran...@linux.ibm.com; borntrae...@de.ibm.com;
> al...@linux.ibm.com
> Subject: [RFC v1 2/2] crypto/virtio-crypto: Register an algo only if it's 
> supported
> 
> From: Farhan Ali 
> 
> Register a crypto algo with the Linux crypto layer only if
> the algorithm is supported by the backend virtio-crypto
> device.
> 
> Also route crypto requests to a virtio-crypto
> device, only if it can support the requested service and
> algorithm.
> 
> Signed-off-by: Farhan Ali 
> ---
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 110
> ++-
>  drivers/crypto/virtio/virtio_crypto_common.h |  11 ++-
>  drivers/crypto/virtio/virtio_crypto_mgr.c|  81 ++--
>  3 files changed, 158 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> index ba190cf..fef112a 100644
> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -49,12 +49,18 @@ struct virtio_crypto_sym_request {
>   bool encrypt;
>  };
> 
> +struct virtio_crypto_algo {
> + uint32_t algonum;
> + uint32_t service;
> + unsigned int active_devs;
> + struct crypto_alg algo;
> +};
> +
>  /*
>   * The algs_lock protects the below global virtio_crypto_active_devs
>   * and crypto algorithms registion.
>   */
>  static DEFINE_MUTEX(algs_lock);
> -static unsigned int virtio_crypto_active_devs;
>  static void virtio_crypto_ablkcipher_finalize_req(
>   struct virtio_crypto_sym_request *vc_sym_req,
>   struct ablkcipher_request *req,
> @@ -312,13 +318,19 @@ static int virtio_crypto_ablkcipher_setkey(struct
> crypto_ablkcipher *tfm,
>unsigned int keylen)
>  {
>   struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(tfm);
> + uint32_t alg;
>   int ret;
> 
> + ret = virtio_crypto_alg_validate_key(keylen, );
> + if (ret)
> + return ret;
> +
>   if (!ctx->vcrypto) {
>   /* New key */
>   int node = virtio_crypto_get_current_node();
>   struct virtio_crypto *vcrypto =
> -   virtcrypto_get_dev_node(node);
> +   virtcrypto_get_dev_node(node,
> +   VIRTIO_CRYPTO_SERVICE_CIPHER, alg);
>   if (!vcrypto) {
>   pr_err("virtio_crypto: Could not find a virtio device 
> in the
> system\n");

We'd better change the above error message now. What about:
 " virtio_crypto: Could not find a virtio device in the system or unsupported 
algo" ?

Regards,
-Gonglei






Re: [Qemu-devel] [RFC v1 1/2] crypto/virtio-crypto: Read crypto services and algorithm masks

2018-06-11 Thread Gonglei (Arei)


> -Original Message-
> From: Farhan Ali [mailto:al...@linux.ibm.com]
> Sent: Saturday, June 09, 2018 3:09 AM
> To: linux-ker...@vger.kernel.org; k...@vger.kernel.org
> Cc: m...@redhat.com; qemu-devel@nongnu.org; Gonglei (Arei)
> ; longpeng ;
> pa...@linux.ibm.com; fran...@linux.ibm.com; borntrae...@de.ibm.com;
> al...@linux.ibm.com
> Subject: [RFC v1 1/2] crypto/virtio-crypto: Read crypto services and algorithm
> masks
> 
> Read the crypto services and algorithm masks which provides
> information about the services and algorithms supported by
> virtio-crypto backend.
> 
> Signed-off-by: Farhan Ali 
> ---
>  drivers/crypto/virtio/virtio_crypto_common.h | 14 ++
>  drivers/crypto/virtio/virtio_crypto_core.c   | 29
> 
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_common.h
> b/drivers/crypto/virtio/virtio_crypto_common.h
> index 66501a5..05eca12e 100644
> --- a/drivers/crypto/virtio/virtio_crypto_common.h
> +++ b/drivers/crypto/virtio/virtio_crypto_common.h
> @@ -55,6 +55,20 @@ struct virtio_crypto {
>   /* Number of queue currently used by the driver */
>   u32 curr_queue;
> 
> + /*
> +  * Specifies the services mask which the device support,
> +  * see VIRTIO_CRYPTO_SERVICE_* above
> +  */

Pls update the above comments. Except that:

Acked-by: Gonglei 

> + u32 crypto_services;
> +
> + /* Detailed algorithms mask */
> + u32 cipher_algo_l;
> + u32 cipher_algo_h;
> + u32 hash_algo;
> + u32 mac_algo_l;
> + u32 mac_algo_h;
> + u32 aead_algo;
> +
>   /* Maximum length of cipher key */
>   u32 max_cipher_key_len;
>   /* Maximum length of authenticated key */
> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c
> b/drivers/crypto/virtio/virtio_crypto_core.c
> index 8332698..8f745f2 100644
> --- a/drivers/crypto/virtio/virtio_crypto_core.c
> +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> @@ -303,6 +303,13 @@ static int virtcrypto_probe(struct virtio_device *vdev)
>   u32 max_data_queues = 0, max_cipher_key_len = 0;
>   u32 max_auth_key_len = 0;
>   u64 max_size = 0;
> + u32 cipher_algo_l = 0;
> + u32 cipher_algo_h = 0;
> + u32 hash_algo = 0;
> + u32 mac_algo_l = 0;
> + u32 mac_algo_h = 0;
> + u32 aead_algo = 0;
> + u32 crypto_services = 0;
> 
>   if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>   return -ENODEV;
> @@ -339,6 +346,20 @@ static int virtcrypto_probe(struct virtio_device *vdev)
>   max_auth_key_len, _auth_key_len);
>   virtio_cread(vdev, struct virtio_crypto_config,
>   max_size, _size);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + crypto_services, _services);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + cipher_algo_l, _algo_l);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + cipher_algo_h, _algo_h);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + hash_algo, _algo);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + mac_algo_l, _algo_l);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + mac_algo_h, _algo_h);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + aead_algo, _algo);
> 
>   /* Add virtio crypto device to global table */
>   err = virtcrypto_devmgr_add_dev(vcrypto);
> @@ -358,6 +379,14 @@ static int virtcrypto_probe(struct virtio_device *vdev)
>   vcrypto->max_cipher_key_len = max_cipher_key_len;
>   vcrypto->max_auth_key_len = max_auth_key_len;
>   vcrypto->max_size = max_size;
> + vcrypto->crypto_services = crypto_services;
> + vcrypto->cipher_algo_l = cipher_algo_l;
> + vcrypto->cipher_algo_h = cipher_algo_h;
> + vcrypto->mac_algo_l = mac_algo_l;
> + vcrypto->mac_algo_h = mac_algo_h;
> + vcrypto->hash_algo = hash_algo;
> + vcrypto->aead_algo = aead_algo;
> +
> 
>   dev_info(>dev,
>   "max_queues: %u, max_cipher_key_len: %u, max_auth_key_len: %u,
> max_size 0x%llx\n",
> --
> 2.7.4




Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread Gonglei (Arei)

> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: Thursday, June 07, 2018 6:40 PM
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 06.06.2018 15:57, Paolo Bonzini wrote:
> > On 06/06/2018 15:28, Gonglei (Arei) wrote:
> >> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> >> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
> >> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> >> mem.userspace_addr=0x7fc343ec, mem.flags=0,
> memory_size=0x9000
> >>
> >> When the memory region is cleared, the KVM will tell the slot to be
> >> invalid (which it is set to KVM_MEMSLOT_INVALID).
> >>
> >> If SeaBIOS accesses this memory and cause page fault, it will find an
> >> invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
> >> it will return an invalid value, and finally it will return a
> >> failure.
> >>
> >> So, My questions are:
> >>
> >> 1) Why don't we hold kvm->slots_lock during page fault processing?
> >
> > Because it's protected by SRCU.  We don't need kvm->slots_lock on the
> > read side.
> >
> >> 2) How do we assure that vcpus will not access the corresponding
> >> region when deleting an memory slot?
> >
> > We don't.  It's generally a guest bug if they do, but the problem here
> > is that QEMU is splitting a memory region in two parts and that is not
> > atomic.
> 
> BTW, one ugly (but QEMU-only) fix would be to temporarily pause all
> VCPUs, do the change and then unpause all VCPUs.
> 

The updating process of memory region is triggered by vcpu thread, not
the main process though.

Thanks,
-Gonglei

> >
> > One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
> > replaces the entire memory map atomically.
> >
> > Paolo
> >
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb


Re: [Qemu-devel] [PATCH] ps2: check PS2Queue wptr pointer in post_load routine

2018-06-07 Thread Gonglei (Arei)



> -Original Message-
> From: liujunjie (A)
> Sent: Thursday, June 07, 2018 4:03 PM
> To: kra...@redhat.com; berra...@redhat.com
> Cc: Gonglei (Arei) ; wangxin (U)
> ; Huangweidong (C)
> ; fangying ;
> qemu-devel@nongnu.org; liujunjie (A) 
> Subject: [PATCH] ps2: check PS2Queue wptr pointer in post_load routine
> 
> In commit 802cbcb7300, most issues have been fixed when qemu guest
> migration. But the queue size still need to check whether is equal to
> PS2_QUEUE_SIZE. If yes, the wptr should set as 0. Or, wptr would larger
> than PS2_QUEUE_SIZE and never come back when ps2_queue_noirq is called.
> This could lead to OOB access, add check to avoid it.
> 
> Signed-off-by: liujunjie 
> ---
>  hw/input/ps2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/input/ps2.c b/hw/input/ps2.c
> index eeec618..fdfcadf 100644
> --- a/hw/input/ps2.c
> +++ b/hw/input/ps2.c
> @@ -927,7 +927,7 @@ static void ps2_common_post_load(PS2State *s)
> 
>  /* reset rptr/wptr/count */
>  q->rptr = 0;
> -q->wptr = size;
> +q->wptr = (size == PS2_QUEUE_SIZE) ? 0 : size;
>  q->count = size;
>  s->update_irq(s->update_arg, q->count != 0);
>  }
> --

Reviewed-by: Gonglei 




Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-06 Thread Gonglei (Arei)
Hi Igor,

Thanks for your response firstly. :)

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Friday, June 01, 2018 6:23 PM
> 
> On Fri, 1 Jun 2018 08:17:12 +
> xuyandong  wrote:
> 
> > Hi there,
> >
> > I am doing some test on qemu vcpu hotplug and I run into some trouble.
> > An emulation failure occurs and qemu prints the following msg:
> >
> > KVM internal error. Suberror: 1
> > emulation failure
> > EAX= EBX= ECX= EDX=0600
> > ESI= EDI= EBP= ESP=fff8
> > EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =   9300
> > CS =f000 000f  9b00
> > SS =   9300
> > DS =   9300
> > FS =   9300
> > GS =   9300
> > LDT=   8200
> > TR =   8b00if
> > GDT=  
> > IDT=  
> > CR0=6010 CR2= CR3= CR4=
> > DR0= DR1= DR2=
> DR3=
> > DR6=0ff0 DR7=0400
> > EFER=
> > Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68
> 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 
> c6
> >
> > I notice that guest is still running SeabBIOS in real mode when the vcpu has
> just been pluged.
> > This emulation failure can be steadly reproduced if I am doing vcpu hotplug
> during VM launch process.
> > After some digging, I find this KVM internal error shows up because KVM
> cannot emulate some MMIO (gpa 0xfff53 ).
> >
> > So I am confused,
> > (1) does qemu support vcpu hotplug even if guest is running seabios ?
> There is no code that forbids it, and I would expect it not to trigger error
> and be NOP.
> 
> > (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm
> confirm it as a mmio address incorrectly?
> KVM trace and bios debug log might give more information to guess where to
> look
> or even better would be to debug Seabios and find out what exactly
> goes wrong if you could do it.

This issue can't be reproduced when we opened Seabios debug log or KVM trace. :(

After a few days of debugging, we found that this problem occurs every time 
when 
the memory region is cleared (memory_size is 0) and the VFIO device is 
hot-plugged. 

The key function is kvm_set_user_memory_region(), I added some logs in it.

gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc751e0, mem.flags=0, memory_size=0x2
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc751e0, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x1
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0xbff4
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0xbff4
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x9000

When the memory region is cleared, the KVM will tell the slot to be
invalid (which it is set to KVM_MEMSLOT_INVALID). 

If SeaBIOS accesses this memory and cause page fault, it will find an invalid 
value according to 
gfn (by __gfn_to_pfn_memslot), and finally it will return an invalid value, and 
finally it will return a failure.

The function calls process is as follows in KVM:

kvm_mmu_page_fault
tdp_page_fault
try_async_pf
__gfn_to_pfn_memslot
__direct_map // return true;
x86_emulate_instruction
handle_emulation_failure

The function calls process is as follows in Qemu:

Breakpoint 1, kvm_set_user_memory_region (kml=0x564aa1e2c890, 
slot=0x564aa1e2d230) at /mnt/sdb/gonglei/qemu/kvm-all.c:261
(gdb) bt
#0  kvm_set_user_memory_region (kml=0x564aa1e2c890, slot=0x564aa1e2d230) at 
/mnt/sdb/gonglei/qemu/kvm-all.c:261
#1  0x564a9e7e3096 in kvm_set_phys_mem (kml=0x564aa1e2c890, 
section=0x7febeb296500, add=false) at /mnt/sdb/gonglei/qemu/kvm-all.c:887
#2  0x564a9e7e34c7 in kvm_region_del (listener=0x564aa1e2c890, 
section=0x7febeb296500) at /mnt/sdb/gonglei/qemu/kvm-all.c:999
#3  0x564a9e7ea884 in address_space_update_topology_pass (as=0x564a9f2b2640 
, old_view=0x7febdc3449c0, 

Re: [Qemu-devel] [PATCH] socket: dont't free msgfds if error equals EAGAIN

2018-05-30 Thread Gonglei (Arei)


> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Wednesday, May 30, 2018 3:33 AM
> To: linzhecheng ; Marc-André Lureau
> 
> Cc: QEMU ; Paolo Bonzini ;
> wangxin (U) ; Gonglei (Arei)
> ; pet...@redhat.com; berra...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH] socket: dont't free msgfds if error equals
> EAGAIN
> 
> On 05/29/2018 04:33 AM, linzhecheng wrote:
> > I think this patch doesn't fix my issue. For more details, please see 
> > Gonglei's
> reply.
> > https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06296.html
> 
> Your mailer is not honoring threading (it failed to include
> 'In-Reply-To:' and 'References:' headers that refer to the message you
> are replying to), and you are top-posting, both of which make it
> difficult to follow your comments on a technical list.
> 
> 
Agree.

@Zhecheng, pls resend a patch with commit message. Ccing these guys.

Regards,
-Gonglei


Re: [Qemu-devel] [PATCH] socket: dont't free msgfds if error equals EAGAIN

2018-05-29 Thread Gonglei (Arei)
Hi all,

The issue is easy to reproduce when we confiugred multi-queue function for 
vhost-user nics.

The main backtrace is as follows:

vhost_user_write  ==>  0)  sets s->write_msgfds_num to 8
qemu_chr_fe_write_all
qemu_chr_fe_write_buffer  ==> 4) rewrite because (ret <0 && 
errno is EAGAIN)
tcp_chr_write  ==> 3) clear resource about 
s->write_msgfds and set s->write_msgfds_num to 0
io_channel_send_full  ==>  2) errno = EAGAIN 
and return -1
qio_channel_socket_writev  ==> 1) 
returns QIO_CHANNEL_ERR_BLOCK when ret <0 && errno == EAGAIN

Then at the above step 4) may cause undefined behaviors on the vhost-user 
server side because null control message is sent. 

So, we submit a patch to fix it. What's your opinion?

Regards,
-Gonglei

> -Original Message-
> From: linzhecheng
> Sent: Tuesday, May 29, 2018 4:20 PM
> To: qemu-devel@nongnu.org
> Cc: pbonz...@redhat.com; wangxin (U) ;
> berra...@redhat.com; pet...@redhat.com; marcandre.lur...@redhat.com;
> ebl...@redhat.com; Gonglei (Arei) 
> Subject: RE: [PATCH] socket: dont't free msgfds if error equals EAGAIN
> 
> CC'ing Daniel P. Berrangé , Peter Xu, Marc-André Lureau, Eric Blake, Gonglei
> 
> > -邮件原件-
> > 发件人: linzhecheng
> > 发送时间: 2018年5月29日 10:53
> > 收件人: qemu-devel@nongnu.org
> > 抄送: pbonz...@redhat.com; wangxin (U)
> ;
> > linzhecheng 
> > 主题: [PATCH] socket: dont't free msgfds if error equals EAGAIN
> >
> > Signed-off-by: linzhecheng 
> >
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c index
> > 159e69c3b1..17519ec589 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -134,8 +134,8 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
> > *buf, int len)
> >  s->write_msgfds,
> >  s->write_msgfds_num);
> >
> > -/* free the written msgfds, no matter what */
> > -if (s->write_msgfds_num) {
> > +/* free the written msgfds in any cases other than errno==EAGAIN
> */
> > +if (EAGAIN != errno && s->write_msgfds_num) {
> >  g_free(s->write_msgfds);
> >  s->write_msgfds = 0;
> >  s->write_msgfds_num = 0;
> > --
> > 2.12.2.windows.2
> >



Re: [Qemu-devel] [PATCH] i386: Allow monitor / mwait cpuid override

2018-02-27 Thread Gonglei (Arei)
Hi all,

Guests could achive good performance in 'Message Passing Workloads' 
scenarios when knowing the X86_FEATURE_MWAIT feature which is presented by 
qemu. 
the reason is that after knowing that feature, 
the guest could use mwait method, which saves VMEXIT, 
to do idle, and achives high performace in latency-sensitive scenario.

Is there any plan for this patch? 

Or May I send a updated version based on yours? @Alex?

Thanks,
-Gonglei


> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Alexander Graf
> Sent: Monday, March 27, 2017 10:27 PM
> To: qemu-devel@nongnu.org
> Cc: Paolo Bonzini; Eduardo Habkost; Richard Henderson
> Subject: [Qemu-devel] [PATCH] i386: Allow monitor / mwait cpuid override
> 
> KVM allows trap and emulate (read: NOP) of the MONITOR and MWAIT
> instructions. There is work undergoing to enable actual execution
> of these inside of KVM, but nobody really wants to expose the feature
> to the guest by default, as it would eat up all of the host CPU.
> 
> So today there is no streamlined way to actually notify the guest that
> it's ok to execute MONITOR / MWAIT, even when we want to explicitly
> leave the guest in guest context.
> 
> This patch adds a new -cpu parameter called "mwait" which - when
> enabled - force enables the MONITOR / MWAIT CPUID flag, even when
> the underlying accel framework does not explicitly advertise support.
> 
> With that in place, we can explicitly allow users to specify that
> they want have the guest execute MONITOR / MWAIT in its idle loop.
> 
> Signed-off-by: Alexander Graf 
> ---
>  target/i386/cpu.c | 5 +
>  target/i386/cpu.h | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 7aa7622..c44020b 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3460,6 +3460,10 @@ static int x86_cpu_filter_features(X86CPU *cpu)
>  x86_cpu_get_supported_feature_word(w, false);
>  uint32_t requested_features = env->features[w];
>  env->features[w] &= host_feat;
> +if (cpu->expose_monitor && (w == FEAT_1_ECX)) {
> +/* Force monitor feature in */
> +env->features[w] |= CPUID_EXT_MONITOR;
> +}
>  cpu->filtered_features[w] = requested_features &
> ~env->features[w];
>  if (cpu->filtered_features[w]) {
>  rv = 1;
> @@ -3988,6 +3992,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
>  DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
>  DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> +DEFINE_PROP_BOOL("mwait", X86CPU, expose_monitor, false),
>  DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0),
>  DEFINE_PROP_BOOL("host-phys-bits", X86CPU, host_phys_bits, false),
>  DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 07401ad..7400d00 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1214,6 +1214,7 @@ struct X86CPU {
>  bool check_cpuid;
>  bool enforce_cpuid;
>  bool expose_kvm;
> +bool expose_monitor;
>  bool migratable;
>  bool max_features; /* Enable all supported features automatically */
>  uint32_t apic_id;
> --
> 1.8.5.6
> 




Re: [Qemu-devel] [PATCH v3] rtc: placing RTC memory region outside BQL

2018-02-22 Thread Gonglei (Arei)
Ping...


Regards,
-Gonglei


> -Original Message-
> From: Gonglei (Arei)
> Sent: Monday, February 12, 2018 4:58 PM
> To: qemu-devel@nongnu.org
> Cc: pbonz...@redhat.com; Huangweidong (C); peter.mayd...@linaro.org;
> Gonglei (Arei)
> Subject: [PATCH v3] rtc: placing RTC memory region outside BQL
> 
> As windows guest use rtc as the clock source device,
> and access rtc frequently. Let's move the rtc memory
> region outside BQL to decrease overhead for windows guests.
> Meanwhile, adding a new lock to avoid different vCPUs
> access the RTC together.
> 
> I tested PCMark 8 (https://www.futuremark.com/benchmarks/pcmark)
> in win7 guest and got the below results:
> 
> Guest: 2U2G
> 
> Before applying the patch:
> 
> Your Work 2.0 score: 2000
> Web Browsing - JunglePin 0.334s
> Web Browsing - Amazonia  0.132s
> Writing  3.59s
> Spreadsheet  70.13s
> Video Chat v2/Video Chat playback 1 v2   22.8 fps
> Video Chat v2/Video Chat encoding v2 307.0 ms
> Benchmark duration   1h 35min 46s
> 
> After applying the patch:
> 
> Your Work 2.0 score: 2040
> Web Browsing - JunglePin 0.345s
> Web Browsing - Amazonia  0.132s
> Writing  3.56s
> Spreadsheet  67.83s
> Video Chat v2/Video Chat playback 1 v2   28.7 fps
> Video Chat v2/Video Chat encoding v2 324.7 ms
> Benchmark duration   1h 32min 5s
> 
> Test results show that optimization is effective under
> stressful situations.
> 
> Signed-off-by: Gonglei <arei.gong...@huawei.com>
> ---
> v3->v2:
>  a) fix a typo, 's/rasie/raise/' [Peter]
>  b) change commit message [Peter]
> 
> v2->v1:
>  a)Adding a new lock to avoid different vCPUs
>access the RTC together. [Paolo]
>  b)Taking the BQL before raising the outbound IRQ line. [Peter]
>  c)Don't hold BQL if it was holden. [Peter]
> 
>  hw/timer/mc146818rtc.c | 55
> ++
>  1 file changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
> index 35a05a6..f0a2a62 100644
> --- a/hw/timer/mc146818rtc.c
> +++ b/hw/timer/mc146818rtc.c
> @@ -85,6 +85,7 @@ typedef struct RTCState {
>  uint16_t irq_reinject_on_ack_count;
>  uint32_t irq_coalesced;
>  uint32_t period;
> +QemuMutex rtc_lock;
>  QEMUTimer *coalesced_timer;
>  Notifier clock_reset_notifier;
>  LostTickPolicy lost_tick_policy;
> @@ -125,6 +126,36 @@ static void rtc_coalesced_timer_update(RTCState *s)
>  }
>  }
> 
> +static void rtc_raise_irq(RTCState *s)
> +{
> +bool unlocked = !qemu_mutex_iothread_locked();
> +
> +if (unlocked) {
> +qemu_mutex_lock_iothread();
> +}
> +
> +qemu_irq_raise(s->irq);
> +
> +if (unlocked) {
> +qemu_mutex_unlock_iothread();
> +}
> +}
> +
> +static void rtc_lower_irq(RTCState *s)
> +{
> +bool unlocked = !qemu_mutex_iothread_locked();
> +
> +if (unlocked) {
> +qemu_mutex_lock_iothread();
> +}
> +
> +qemu_irq_lower(s->irq);
> +
> +if (unlocked) {
> +qemu_mutex_unlock_iothread();
> +}
> +}
> +
>  static QLIST_HEAD(, RTCState) rtc_devices =
>  QLIST_HEAD_INITIALIZER(rtc_devices);
> 
> @@ -141,7 +172,7 @@ void qmp_rtc_reset_reinjection(Error **errp)
>  static bool rtc_policy_slew_deliver_irq(RTCState *s)
>  {
>  apic_reset_irq_delivered();
> -qemu_irq_raise(s->irq);
> +rtc_raise_irq(s);
>  return apic_get_irq_delivered();
>  }
> 
> @@ -277,8 +308,9 @@ static void rtc_periodic_timer(void *opaque)
>  DPRINTF_C("cmos: coalesced irqs increased to %d\n",
>s->irq_coalesced);
>  }
> -} else
> -qemu_irq_raise(s->irq);
> +} else {
> +rtc_raise_irq(s);
> +}
>  }
>  }
> 
> @@ -459,7 +491,7 @@ static void rtc_update_timer(void *opaque)
>  s->cmos_data[RTC_REG_C] |= irqs;
>  if ((new_irqs & s->cmos_data[RTC_REG_B]) != 0) {
>  s->cmos_data[RTC_REG_C] |= REG_C_IRQF;
> -qemu_irq_raise(s->irq);
> +rtc_raise_irq(s);
>  }
>  check_update_timer(s);
>  }
> @@ -471,6 +503,7 @@ static void cmos_ioport_write(void *opaque, hwaddr
> addr,
>  uint32_t old_period;
>  bool update_periodic_timer;
> 
> +qemu_mutex_lock(>rtc_lock);
>

Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features, fixes and cleanups

2018-02-10 Thread Gonglei (Arei)
> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Peter Maydell
> Sent: Friday, February 09, 2018 6:07 PM
> To: Michael S. Tsirkin
> Cc: QEMU Developers
> Subject: Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features, 
> fixes and
> cleanups
> 
> On 8 February 2018 at 19:08, Michael S. Tsirkin  wrote:
> > The following changes since commit
> 008a51bbb343972dd8cf09126da8c3b87f4e1c96:
> >
> >   Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request'
> into staging (2018-02-08 14:31:51 +)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to
> f4ac9b2e04e8d98854a97bc473353207765aa9e7:
> >
> >   virtio-balloon: include statistics of disk/file caches (2018-02-08 
> > 21:06:42
> +0200)
> >
> > 
> > virtio,vhost,pci,pc: features, fixes and cleanups
> >
> > - a new vhost crypto device
> > - new stats in virtio balloon
> > - virtio eventfd rework for boot speedup
> > - vhost memory rework for boot speedup
> > - fixes and cleanups all over the place
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> 
> Hi. This has some format-string issues:
> 
> /home/peter.maydell/qemu/backends/cryptodev-vhost-user.c: In function
> 'cryptodev_vhost_user_start':
> /home/peter.maydell/qemu/backends/cryptodev-vhost-user.c:112:26:
> error: format '%lu' expects argument of type 'long unsigned int', but
> argument 2 has type 'size_t {aka unsigned int}' [-Werror=format=]
>  error_report("failed to init vhost_crypto for queue %lu", i);
>   ^
> /home/peter.maydell/qemu/backends/cryptodev-vhost-user.c: In function
> 'cryptodev_vhost_user_init':
> /home/peter.maydell/qemu/backends/cryptodev-vhost-user.c:205:40:
> error: format '%lu' expects argument of type 'long unsigned int', but
> argument 2 has type 'size_t {aka unsigned int}' [-Werror=format=]
>  cc->info_str = g_strdup_printf("cryptodev-vhost-user%lu to %s ",
> ^
> 
Using %zu instead of %lu will be correct. Michael, could you pls fix it 
directly?

Very sorry for the inconvenience. :(

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH v2] rtc: placing RTC memory region outside BQL

2018-02-09 Thread Gonglei (Arei)
>
> > >
> > > $ cat strace_c.sh
> > > strace -tt -p $1 -c -o result_$1.log &
> > > sleep $2
> > > pid=$(pidof strace)
> > > kill $pid
> > > cat result_$1.log
> > >
> > > Before appling this change:
> > > $ ./strace_c.sh 10528 30
> > > % time seconds  usecs/call callserrors syscall
> > > -- --- --- - - 
> > >  93.870.119070  30  4000   ppoll
> > >   3.270.004148   2  2038   ioctl
> > >   2.660.003370   2  2014   futex
> > >   0.090.000113   1   106   read
> > >   0.090.000109   1   104   io_getevents
> > >   0.020.29   130   poll
> > >   0.000.00   0 1   write
> > > -- --- --- - - 
> > > 100.000.126839  8293   total
> > >
> > > After appling the change:
> > > $ ./strace_c.sh 23829 30
> > > % time seconds  usecs/call callserrors syscall
> > > -- --- --- - - 
> > >  92.860.067441  16  4094   ppoll
> > >   4.850.003522   2  2136   ioctl
> > >   1.170.000850   4   189   futex
> > >   0.540.000395   2   202   read
> > >   0.520.000379   2   202   io_getevents
> > >   0.050.37   130   poll
> > > -- --- --- - - 
> > > 100.000.072624  6853   total
> > >
> > > The futex call number decreases ~90.6% on an idle windows 7 guest.
> >
> > These are the same figures as from v1 -- it would be interesting
> > to check whether the additional locking that v2 adds has affected
> > the results.
> >
> Oh, yes. the futex number of v2 don't decline compared too much to v1 because
> it
> takes the BQL before raising the outbound IRQ line now.
> 
> Before applying v2:
> # ./strace_c.sh 8776 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  78.010.164188  26  6436   ppoll
>   8.390.017650   5  370039 futex
>   7.680.016157   6  2758   ioctl
>   5.480.011530   3  4586  1113 read
>   0.300.000640  2032   io_submit
>   0.150.000317   489   write
> -- --- --- - - 
> 100.000.210482 17601  1152 total
> 
> After applying v2:
> # ./strace_c.sh 15968 30
> % time seconds  usecs/call callserrors syscall
> -- --- --- - - 
>  78.280.171117  27  6272   ppoll
>   8.500.018571   5  366321 futex
>   7.760.016973   6  2732   ioctl
>   4.850.010597   3  4115   853 read
>   0.310.000672  1163   io_submit
>   0.300.000659   4   180   write
> -- --- --- - - 
> 100.000.218589 17025   874 total
> 
> > Does the patch improve performance in a more interesting use
> > case than "the guest is just idle" ?
> >
> I think so, after all, the scope of the locking is reduced .
> Besides this, can we optimize the rtc timer to avoid to hold BQL
> by separate threads?
> 
Hi Peter, Paolo

I tested PCMark 8 (https://www.futuremark.com/benchmarks/pcmark) 
in win7 guest and got the below results:

Guest: 2U2G

Before applying v2:

Your Work 2.0 score:   2000
Web Browsing - JunglePin0.334s
Web Browsing - Amazonia0.132s
Writing3.59s
Spreadsheet70.13s
Video Chat v2/Video Chat playback 1 v2   22.8 fps
Video Chat v2/Video Chat encoding v2   307.0 ms
Benchmark duration1h 35min 46s

After applying v2:

Your Work 2.0 score:   2040
Web Browsing - JunglePin0.345s
Web Browsing - Amazonia0.132s
Writing3.56s
Spreadsheet67.83s
Video Chat v2/Video Chat playback 1 v2   28.7 fps
Video Chat v2/Video Chat encoding v2   324.7 ms
Benchmark duration1h 32min 5s

Test results show that optimization is very effective in stressful situations.

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH] vl: fix possible int overflow for qemu_timedate_diff()

2018-02-07 Thread Gonglei (Arei)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, February 06, 2018 11:52 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: shenghualong
> Subject: Re: [PATCH] vl: fix possible int overflow for qemu_timedate_diff()
> 
> On 01/02/2018 12:59, Gonglei wrote:
> > From: shenghualong <shenghual...@huawei.com>
> >
> > When the Windows guest users set the time to year 2099,
> > the return value of qemu_timedate_diff() will overflow
> > with variable clock mode as below format:
> >
> >  
> >
> > Let's change the return value of qemu_timedate_diff() from
> > int to time_t to fix the possible overflow problem.
> >
> > Signed-off-by: shenghualong <shenghual...@huawei.com>
> > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> 
> Thanks, this makes sense.  However, looking at the users, you should
> also change the type of:
> 
> - the diff variable in hw/timer/m48t59.c function set_alarm;
> 
> - the offset argument of the RTC_CHANGE QAPI event (to int64)
> 
> - the sec_offset and alm_sec fields of MenelausState in hw/timer/twl92230.c
> 
> - the offset argument of qemu_get_timedate.
> 
OK, will do.

Thanks,
-Gonglei

> Thanks,
> 
> Paolo
> 
> > ---
> >  include/qemu-common.h | 2 +-
> >  vl.c  | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/qemu-common.h b/include/qemu-common.h
> > index 05319b9..6fb80aa 100644
> > --- a/include/qemu-common.h
> > +++ b/include/qemu-common.h
> > @@ -33,7 +33,7 @@ int qemu_main(int argc, char **argv, char **envp);
> >  #endif
> >
> >  void qemu_get_timedate(struct tm *tm, int offset);
> > -int qemu_timedate_diff(struct tm *tm);
> > +time_t qemu_timedate_diff(struct tm *tm);
> >
> >  #define qemu_isalnum(c)isalnum((unsigned char)(c))
> >  #define qemu_isalpha(c)isalpha((unsigned char)(c))
> > diff --git a/vl.c b/vl.c
> > index e517a8d..9d225da 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -146,7 +146,7 @@ int nb_nics;
> >  NICInfo nd_table[MAX_NICS];
> >  int autostart;
> >  static int rtc_utc = 1;
> > -static int rtc_date_offset = -1; /* -1 means no change */
> > +static time_t rtc_date_offset = -1; /* -1 means no change */
> >  QEMUClockType rtc_clock;
> >  int vga_interface_type = VGA_NONE;
> >  static int full_screen = 0;
> > @@ -812,7 +812,7 @@ void qemu_get_timedate(struct tm *tm, int offset)
> >  }
> >  }
> >
> > -int qemu_timedate_diff(struct tm *tm)
> > +time_t qemu_timedate_diff(struct tm *tm)
> >  {
> >  time_t seconds;
> >
> >



Re: [Qemu-devel] [PATCH v2] rtc: placing RTC memory region outside BQL

2018-02-07 Thread Gonglei (Arei)
> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Tuesday, February 06, 2018 10:36 PM
> To: Gonglei (Arei)
> Cc: QEMU Developers; Paolo Bonzini; Huangweidong (C)
> Subject: Re: [PATCH v2] rtc: placing RTC memory region outside BQL
> 
> On 6 February 2018 at 14:07, Gonglei <arei.gong...@huawei.com> wrote:
> > As windows guest use rtc as the clock source device,
> > and access rtc frequently. Let's move the rtc memory
> > region outside BQL to decrease overhead for windows guests.
> > Meanwhile, adding a new lock to avoid different vCPUs
> > access the RTC together.
> >
> > $ cat strace_c.sh
> > strace -tt -p $1 -c -o result_$1.log &
> > sleep $2
> > pid=$(pidof strace)
> > kill $pid
> > cat result_$1.log
> >
> > Before appling this change:
> > $ ./strace_c.sh 10528 30
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  93.870.119070  30  4000   ppoll
> >   3.270.004148   2  2038   ioctl
> >   2.660.003370   2  2014   futex
> >   0.090.000113   1   106   read
> >   0.090.000109   1   104   io_getevents
> >   0.020.29   130   poll
> >   0.000.00   0 1   write
> > -- --- --- - - 
> > 100.000.126839  8293   total
> >
> > After appling the change:
> > $ ./strace_c.sh 23829 30
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  92.860.067441  16  4094   ppoll
> >   4.850.003522   2  2136   ioctl
> >   1.170.000850   4   189   futex
> >   0.540.000395   2   202   read
> >   0.520.000379   2   202   io_getevents
> >   0.050.37   130   poll
> > -- --- --- - - 
> > 100.000.072624  6853   total
> >
> > The futex call number decreases ~90.6% on an idle windows 7 guest.
> 
> These are the same figures as from v1 -- it would be interesting
> to check whether the additional locking that v2 adds has affected
> the results.
> 
Oh, yes. the futex number of v2 don't decline compared too much to v1 because it
takes the BQL before raising the outbound IRQ line now.

Before applying v2:
# ./strace_c.sh 8776 30
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 78.010.164188  26  6436   ppoll
  8.390.017650   5  370039 futex
  7.680.016157   6  2758   ioctl
  5.480.011530   3  4586  1113 read
  0.300.000640  2032   io_submit
  0.150.000317   489   write
-- --- --- - - 
100.000.210482 17601  1152 total

After applying v2:
# ./strace_c.sh 15968 30
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 78.280.171117  27  6272   ppoll
  8.500.018571   5  366321 futex
  7.760.016973   6  2732   ioctl
  4.850.010597   3  4115   853 read
  0.310.000672  1163   io_submit
  0.300.000659   4   180   write
-- --- --- - - 
100.000.218589 17025   874 total

> Does the patch improve performance in a more interesting use
> case than "the guest is just idle" ?
> 
I think so, after all, the scope of the locking is reduced . 
Besides this, can we optimize the rtc timer to avoid to hold BQL 
by separate threads?

> > +static void rtc_rasie_irq(RTCState *s)
> 
> Typo: should be "raise".
> 
Good catch. :)

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH] rtc: placing RTC memory region outside BQL

2018-02-06 Thread Gonglei (Arei)

> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Tuesday, February 06, 2018 5:49 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; QEMU Developers; Huangweidong (C)
> Subject: Re: [Qemu-devel] [PATCH] rtc: placing RTC memory region outside BQL
> 
> On 6 February 2018 at 08:24, Gonglei (Arei) <arei.gong...@huawei.com>
> wrote:
> > So, taking BQL is necessary, and what we can do is trying our best to narrow
> > down the process of locking ? For example, do the following wrapping:
> >
> > static void rtc_rasie_irq(RTCState *s)
> > {
> > qemu_mutex_lock_iothread();
> > qemu_irq_raise(s->irq);
> > qemu_mutex_unlock_iothread();
> > }
> >
> > static void rtc_lower_irq(RTCState *s)
> > {
> > qemu_mutex_lock_iothread();
> > qemu_irq_lower(s->irq);
> > qemu_mutex_unlock_iothread();
> > }
> 
> If you do that you'll also need to be careful about not calling
> those functions from contexts where you already hold the iothread
> mutex (eg timer callbacks), since you can't lock a mutex you
> already have locked.
> 
Exactly, all contexts caused by the main process. :)
Three timers callbacks, calling rtc_reset().

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH] rtc: placing RTC memory region outside BQL

2018-02-06 Thread Gonglei (Arei)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Monday, February 05, 2018 10:04 PM
> To: Peter Maydell
> Cc: Gonglei (Arei); QEMU Developers; Huangweidong (C)
> Subject: Re: [Qemu-devel] [PATCH] rtc: placing RTC memory region outside BQL
> 
> On 04/02/2018 19:02, Peter Maydell wrote:
> > On 1 February 2018 at 14:23, Paolo Bonzini <pbonz...@redhat.com> wrote:
> >> On 01/02/2018 08:47, Gonglei wrote:
> >>> diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
> >>> index 35a05a6..d9d99c5 100644
> >>> --- a/hw/timer/mc146818rtc.c
> >>> +++ b/hw/timer/mc146818rtc.c
> >>> @@ -986,6 +986,7 @@ static void rtc_realizefn(DeviceState *dev, Error
> **errp)
> >>>  qemu_register_suspend_notifier(>suspend_notifier);
> >>>
> >>>  memory_region_init_io(>io, OBJECT(s), _ops, s, "rtc", 2);
> >>> +memory_region_clear_global_locking(>io);
> >>>  isa_register_ioport(isadev, >io, base);
> >>>
> >>>  qdev_set_legacy_instance_id(dev, base, 3);
> >>>
> >>
> >> This is not enough, you need to add a new lock or something like that.
> >> Otherwise two vCPUs can access the RTC together and make a mess.
> >
> > Do you also need to do something to take the global lock before
> > raising the outbound IRQ line (since it might be connected to a device
> > that does need the global lock), or am I confused ?
> 
> Yes, that's a good point.  Most of the time the IRQ line is raised in a
> timer, but not always.
> 
So, taking BQL is necessary, and what we can do is trying our best to narrow
down the process of locking ? For example, do the following wrapping:

static void rtc_rasie_irq(RTCState *s)
{
qemu_mutex_lock_iothread();
qemu_irq_raise(s->irq);
qemu_mutex_unlock_iothread();
}

static void rtc_lower_irq(RTCState *s)
{
qemu_mutex_lock_iothread();
qemu_irq_lower(s->irq);
qemu_mutex_unlock_iothread();
}

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH] rtc: placing RTC memory region outside BQL

2018-02-03 Thread Gonglei (Arei)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, February 01, 2018 10:24 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: Huangweidong (C)
> Subject: Re: [PATCH] rtc: placing RTC memory region outside BQL
> 
> On 01/02/2018 08:47, Gonglei wrote:
> > As windows guest use rtc as the clock source device,
> > and access rtc frequently. Let's move the rtc memory
> > region outside BQL to decrease overhead for windows guests.
> >
> > strace -tt -p $1 -c -o result_$1.log &
> > sleep $2
> > pid=$(pidof strace)
> > kill $pid
> > cat result_$1.log
> >
> > Before appling this change:
> >
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  93.870.119070  30  4000   ppoll
> >   3.270.004148   2  2038   ioctl
> >   2.660.003370   2  2014   futex
> >   0.090.000113   1   106   read
> >   0.090.000109   1   104   io_getevents
> >   0.020.29   130   poll
> >   0.000.00   0 1   write
> > -- --- --- - - 
> > 100.000.126839  8293   total
> >
> > After appling the change:
> >
> > % time seconds  usecs/call callserrors syscall
> > -- --- --- - - 
> >  92.860.067441  16  4094   ppoll
> >   4.850.003522   2  2136   ioctl
> >   1.170.000850   4   189   futex
> >   0.540.000395   2   202   read
> >   0.520.000379   2   202   io_getevents
> >   0.050.37   130   poll
> > -- --- --- - - 
> > 100.000.072624  6853   total
> >
> > The futex call number decreases ~90.6% on an idle windows 7 guest.
> >
> > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > ---
> >  hw/timer/mc146818rtc.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
> > index 35a05a6..d9d99c5 100644
> > --- a/hw/timer/mc146818rtc.c
> > +++ b/hw/timer/mc146818rtc.c
> > @@ -986,6 +986,7 @@ static void rtc_realizefn(DeviceState *dev, Error
> **errp)
> >  qemu_register_suspend_notifier(>suspend_notifier);
> >
> >  memory_region_init_io(>io, OBJECT(s), _ops, s, "rtc", 2);
> > +memory_region_clear_global_locking(>io);
> >  isa_register_ioport(isadev, >io, base);
> >
> >  qdev_set_legacy_instance_id(dev, base, 3);
> >
> 
> This is not enough, you need to add a new lock or something like that.
> Otherwise two vCPUs can access the RTC together and make a mess.
> 

Hi Paolo,

Yes, that's true, although I have not encountered any problems yet.
Let me enhance it in v2.

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH v3 1/4] cryptodev: add vhost-user as a new cryptodev backend

2018-01-17 Thread Gonglei (Arei)


> -Original Message-
> From: Zhoujian (jay)
> Sent: Wednesday, January 17, 2018 1:01 PM
> To: Michael S. Tsirkin
> Cc: pa...@linux.vnet.ibm.com; Huangweidong (C); xin.z...@intel.com;
> qemu-devel@nongnu.org; Gonglei (Arei); roy.fan.zh...@intel.com;
> stefa...@redhat.com; pbonz...@redhat.com; longpeng
> Subject: RE: [Qemu-devel] [PATCH v3 1/4] cryptodev: add vhost-user as a new
> cryptodev backend
> 
> > -Original Message-
> > From: Qemu-devel [mailto:qemu-devel-
> > bounces+jianjay.zhou=huawei@nongnu.org] On Behalf Of Michael S.
> Tsirkin
> > Sent: Wednesday, January 17, 2018 12:41 AM
> > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > Cc: pa...@linux.vnet.ibm.com; Huangweidong (C)
> <weidong.hu...@huawei.com>;
> > xin.z...@intel.com; qemu-devel@nongnu.org; Gonglei (Arei)
> > <arei.gong...@huawei.com>; roy.fan.zh...@intel.com;
> stefa...@redhat.com;
> > pbonz...@redhat.com; longpeng <longpe...@huawei.com>
> > Subject: Re: [Qemu-devel] [PATCH v3 1/4] cryptodev: add vhost-user as a new
> > cryptodev backend
> >
> > On Tue, Jan 16, 2018 at 10:06:50PM +0800, Jay Zhou wrote:
> > > From: Gonglei <arei.gong...@huawei.com>
> > >
> > > Usage:
> > >  -chardev socket,id=charcrypto0,path=/path/to/your/socket
> > >  -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0
> > >  -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0
> > >
> > > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > > Signed-off-by: Longpeng(Mike) <longpe...@huawei.com>
> > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > ---
> > >  backends/Makefile.objs   |   4 +
> > >  backends/cryptodev-vhost-user.c  | 333
> > +++
> > >  backends/cryptodev-vhost.c   |  73 +
> > >  include/sysemu/cryptodev-vhost.h | 154 ++
> > >  qemu-options.hx  |  21 +++
> > >  vl.c |   4 +
> > >  6 files changed, 589 insertions(+)
> > >  create mode 100644 backends/cryptodev-vhost-user.c  create mode
> > > 100644 backends/cryptodev-vhost.c  create mode 100644
> > > include/sysemu/cryptodev-vhost.h
> > >
> > > diff --git a/backends/Makefile.objs b/backends/Makefile.objs index
> > > 0400799..9e1fb76 100644
> > > --- a/backends/Makefile.objs
> > > +++ b/backends/Makefile.objs
> > > @@ -8,3 +8,7 @@ common-obj-$(CONFIG_LINUX) += hostmem-file.o
> > >
> > >  common-obj-y += cryptodev.o
> > >  common-obj-y += cryptodev-builtin.o
> > > +
> > > +ifeq ($(CONFIG_VIRTIO),y)
> > > +common-obj-$(CONFIG_LINUX) += cryptodev-vhost.o
> > > +cryptodev-vhost-user.o endif
> >
> > Shouldn't this depend on CONFIG_VHOST_USER?
> 
> Yes, you're right. Will fix it soon.
> 
Hi Michael,

Can we apply this patch set firstly and then fix it on the top with other 
comments?

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH 3/7] i386: Add spec-ctrl CPUID bit

2018-01-16 Thread Gonglei (Arei)

> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Monday, January 15, 2018 8:23 PM
> To: Gonglei (Arei)
> Cc: qemu-devel@nongnu.org; Paolo Bonzini
> Subject: Re: [Qemu-devel] [PATCH 3/7] i386: Add spec-ctrl CPUID bit
> 
> On Sat, Jan 13, 2018 at 03:04:44AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Qemu-devel
> > > [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> > > Behalf Of Eduardo Habkost
> > > Sent: Tuesday, January 09, 2018 11:45 PM
> > > To: qemu-devel@nongnu.org
> > > Cc: Paolo Bonzini
> > > Subject: [Qemu-devel] [PATCH 3/7] i386: Add spec-ctrl CPUID bit
> > >
> > > Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.
> > >
> > > Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
> > > ---
> > >  target/i386/cpu.h | 1 +
> > >  target/i386/cpu.c | 2 +-
> > >  2 files changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > > index 07f47997d6..de387c1311 100644
> > > --- a/target/i386/cpu.h
> > > +++ b/target/i386/cpu.h
> > > @@ -667,6 +667,7 @@ typedef uint32_t
> > > FeatureWordArray[FEATURE_WORDS];
> > >
> > >  #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural
> > > Network Instructions */
> > >  #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply
> > > Accumulation Single Precision */
> > > +#define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation
> Control
> > > */
> > >
> > >  #define CPUID_XSAVE_XSAVEOPT   (1U << 0)
> > >  #define CPUID_XSAVE_XSAVEC (1U << 1)
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 9f4f949899..1be1642eb2 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -459,7 +459,7 @@ static FeatureWordInfo
> > > feature_word_info[FEATURE_WORDS] = {
> > >  NULL, NULL, NULL, NULL,
> > >  NULL, NULL, NULL, NULL,
> > >  NULL, NULL, NULL, NULL,
> > > -NULL, NULL, NULL, NULL,
> > > +NULL, NULL, "spec-ctrl", NULL,
> > >  NULL, NULL, NULL, NULL,
> > >  },
> > >  .cpuid_eax = 7,
> > > --
> > > 2.14.3
> > >
> > Don't we need to pass-through cupid_7_edx to guest when configuring '-cpu
> host'?
> > Otherwise how guests use IBPB/IBRS/STIPB capabilities?
> 
> We already do.  See the check for cpu->max_features at
> x86_cpu_expand_features().
> 
> Do you see something else missing?
> 
No, thank you. My bad. :(

Thanks,
-Gonglei




Re: [Qemu-devel] [PATCH 3/7] i386: Add spec-ctrl CPUID bit

2018-01-12 Thread Gonglei (Arei)

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Eduardo Habkost
> Sent: Tuesday, January 09, 2018 11:45 PM
> To: qemu-devel@nongnu.org
> Cc: Paolo Bonzini
> Subject: [Qemu-devel] [PATCH 3/7] i386: Add spec-ctrl CPUID bit
> 
> Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  target/i386/cpu.h | 1 +
>  target/i386/cpu.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 07f47997d6..de387c1311 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -667,6 +667,7 @@ typedef uint32_t
> FeatureWordArray[FEATURE_WORDS];
> 
>  #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural
> Network Instructions */
>  #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply
> Accumulation Single Precision */
> +#define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control
> */
> 
>  #define CPUID_XSAVE_XSAVEOPT   (1U << 0)
>  #define CPUID_XSAVE_XSAVEC (1U << 1)
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 9f4f949899..1be1642eb2 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -459,7 +459,7 @@ static FeatureWordInfo
> feature_word_info[FEATURE_WORDS] = {
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
> -NULL, NULL, NULL, NULL,
> +NULL, NULL, "spec-ctrl", NULL,
>  NULL, NULL, NULL, NULL,
>  },
>  .cpuid_eax = 7,
> --
> 2.14.3
> 
Don't we need to pass-through cupid_7_edx to guest when configuring '-cpu host'?
Otherwise how guests use IBPB/IBRS/STIPB capabilities?

Thanks,
-Gonglei




Re: [Qemu-devel] [PATCH 0/4] cryptodev: add vhost support

2017-12-21 Thread Gonglei (Arei)

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Thursday, December 21, 2017 10:25 PM
> To: Gonglei (Arei)
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C);
> stefa...@redhat.com; Zhoujian (jay); pa...@linux.vnet.ibm.com; longpeng;
> xin.z...@intel.com; roy.fan.zh...@intel.com
> Subject: Re: [PATCH 0/4] cryptodev: add vhost support
> 
> On Tue, Nov 28, 2017 at 05:03:05PM +0800, Gonglei wrote:
> > I posted the RFC verion five months ago for DPDK
> > vhost-crypto implmention, and now it's time to send
> > the formal version. Because we need an user space scheme
> > for better performance.
> >
> > The vhost user crypto server side patches had been
> > sent to DPDK community, pls see
> >
> > [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto
> backend support
> > http://dpdk.org/ml/archives/dev/2017-November/081048.html
> >
> > You also can get virtio-crypto polling mode driver from:
> >
> > [PATCH] virtio: add new driver for crypto devices
> > http://dpdk.org/ml/archives/dev/2017-November/081985.html
> >
> 
> This makes build on mingw break:
> 
>   CC  sparc64-softmmu/hw/scsi/virtio-scsi-dataplane.o
> hw/virtio/virtio-crypto.o: In function `virtio_crypto_vhost_status':
> /scm/qemu/hw/virtio/virtio-crypto.c:898: undefined reference to
> `cryptodev_get_vhost'
> /scm/qemu/hw/virtio/virtio-crypto.c:910: undefined reference to
> `cryptodev_vhost_start'
> /scm/qemu/hw/virtio/virtio-crypto.c:917: undefined reference to
> `cryptodev_vhost_stop'
> hw/virtio/virtio-crypto.o: In function `virtio_crypto_guest_notifier_pending':
> /scm/qemu/hw/virtio/virtio-crypto.c:947: undefined reference to
> `cryptodev_vhost_virtqueue_pending'
> hw/virtio/virtio-crypto.o: In function `virtio_crypto_guest_notifier_mask':
> /scm/qemu/hw/virtio/virtio-crypto.c:937: undefined reference to
> `cryptodev_vhost_virtqueue_mask'
> collect2: error: ld returned 1 exit status
> make[1]: *** [Makefile:193: qemu-system-i386.exe] Error 1
> make: *** [Makefile:383: subdir-i386-softmmu] Error 2
> 
> 
Sorry about that. We'll build it on a cross-compiler environment.

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH 0/4] cryptodev: add vhost support

2017-12-20 Thread Gonglei (Arei)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Thursday, December 21, 2017 1:39 AM
> To: Gonglei (Arei)
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C);
> stefa...@redhat.com; Zhoujian (jay); pa...@linux.vnet.ibm.com; longpeng;
> xin.z...@intel.com; roy.fan.zh...@intel.com
> Subject: Re: [PATCH 0/4] cryptodev: add vhost support
> 
> On Mon, Dec 18, 2017 at 09:03:16AM +, Gonglei (Arei) wrote:
> > Ping...
> >
> > Fan (working for DPDK parts) is waiting for those patches upstreamed. :)
> >
> > Thanks,
> > -Gonglei
> 
> As far as I am concerned, the main issue is that it says it assumes
> polling.  virtio does not work like this right now.  As long as spec
> does not support interrupt mode, I don't think we can merge this.
> 
Sorry, Michael. This makes me confused. Because the Qemu part about vhost-user 
crypto 
doesn't do this assumption. The main controversial point is whether session 
operations
should be added in the vhost-user protocol, raised by Paolo. And we made an 
explanation. 

Thanks,
-Gonglei

> >
> > > -Original Message-
> > > From: Gonglei (Arei)
> > > Sent: Tuesday, November 28, 2017 5:03 PM
> > > To: qemu-devel@nongnu.org
> > > Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C);
> > > stefa...@redhat.com; Zhoujian (jay); pa...@linux.vnet.ibm.com;
> longpeng;
> > > xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei (Arei)
> > > Subject: [PATCH 0/4] cryptodev: add vhost support
> > >
> > > I posted the RFC verion five months ago for DPDK
> > > vhost-crypto implmention, and now it's time to send
> > > the formal version. Because we need an user space scheme
> > > for better performance.
> > >
> > > The vhost user crypto server side patches had been
> > > sent to DPDK community, pls see
> > >
> > > [RFC PATCH 0/6] lib/librte_vhost: introduce new   vhost_user crypto
> backend
> > > support
> > > http://dpdk.org/ml/archives/dev/2017-November/081048.html
> > >
> > > You also can get virtio-crypto polling mode driver from:
> > >
> > > [PATCH] virtio: add new driver for crypto devices
> > > http://dpdk.org/ml/archives/dev/2017-November/081985.html
> > >
> > >
> > > Gonglei (4):
> > >   cryptodev: add vhost-user as a new cryptodev backend
> > >   cryptodev: add vhost support
> > >   cryptodev-vhost-user: add crypto session handler
> > >   cryptodev-vhost-user: set the key length
> > >
> > >  backends/Makefile.objs|   4 +
> > >  backends/cryptodev-builtin.c  |   1 +
> > >  backends/cryptodev-vhost-user.c   | 381
> > > ++
> > >  backends/cryptodev-vhost.c| 297
> > > ++
> > >  docs/interop/vhost-user.txt   |  19 ++
> > >  hw/virtio/vhost-user.c|  89 
> > >  hw/virtio/virtio-crypto.c |  70 +++
> > >  include/hw/virtio/vhost-backend.h |   8 +
> > >  include/hw/virtio/virtio-crypto.h |   1 +
> > >  include/sysemu/cryptodev-vhost-user.h |  47 +
> > >  include/sysemu/cryptodev-vhost.h  | 154 ++
> > >  include/sysemu/cryptodev.h|   8 +
> > >  qemu-options.hx   |  21 ++
> > >  vl.c  |   4 +
> > >  14 files changed, 1104 insertions(+)
> > >  create mode 100644 backends/cryptodev-vhost-user.c
> > >  create mode 100644 backends/cryptodev-vhost.c
> > >  create mode 100644 include/sysemu/cryptodev-vhost-user.h
> > >  create mode 100644 include/sysemu/cryptodev-vhost.h
> > >
> > > --
> > > 1.8.3.1
> > >



Re: [Qemu-devel] [PATCH 0/4] cryptodev: add vhost support

2017-12-18 Thread Gonglei (Arei)
Ping...

Fan (working for DPDK parts) is waiting for those patches upstreamed. :)

Thanks,
-Gonglei


> -Original Message-
> From: Gonglei (Arei)
> Sent: Tuesday, November 28, 2017 5:03 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C);
> stefa...@redhat.com; Zhoujian (jay); pa...@linux.vnet.ibm.com; longpeng;
> xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei (Arei)
> Subject: [PATCH 0/4] cryptodev: add vhost support
> 
> I posted the RFC verion five months ago for DPDK
> vhost-crypto implmention, and now it's time to send
> the formal version. Because we need an user space scheme
> for better performance.
> 
> The vhost user crypto server side patches had been
> sent to DPDK community, pls see
> 
> [RFC PATCH 0/6] lib/librte_vhost: introduce new   vhost_user crypto 
> backend
> support
> http://dpdk.org/ml/archives/dev/2017-November/081048.html
> 
> You also can get virtio-crypto polling mode driver from:
> 
> [PATCH] virtio: add new driver for crypto devices
> http://dpdk.org/ml/archives/dev/2017-November/081985.html
> 
> 
> Gonglei (4):
>   cryptodev: add vhost-user as a new cryptodev backend
>   cryptodev: add vhost support
>   cryptodev-vhost-user: add crypto session handler
>   cryptodev-vhost-user: set the key length
> 
>  backends/Makefile.objs|   4 +
>  backends/cryptodev-builtin.c  |   1 +
>  backends/cryptodev-vhost-user.c   | 381
> ++
>  backends/cryptodev-vhost.c| 297
> ++
>  docs/interop/vhost-user.txt   |  19 ++
>  hw/virtio/vhost-user.c|  89 
>  hw/virtio/virtio-crypto.c |  70 +++
>  include/hw/virtio/vhost-backend.h |   8 +
>  include/hw/virtio/virtio-crypto.h |   1 +
>  include/sysemu/cryptodev-vhost-user.h |  47 +
>  include/sysemu/cryptodev-vhost.h  | 154 ++
>  include/sysemu/cryptodev.h|   8 +
>  qemu-options.hx   |  21 ++
>  vl.c  |   4 +
>  14 files changed, 1104 insertions(+)
>  create mode 100644 backends/cryptodev-vhost-user.c
>  create mode 100644 backends/cryptodev-vhost.c
>  create mode 100644 include/sysemu/cryptodev-vhost-user.h
>  create mode 100644 include/sysemu/cryptodev-vhost.h
> 
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [v22 1/2] virtio-crypto: Add virtio crypto device specification

2017-12-07 Thread Gonglei (Arei)
> 
> On 12/06/2017 08:37 AM, Longpeng(Mike) wrote:
> > +\field{outcome_len} is the size of struct virtio_crypto_session_input or
> > +ZERO for the session-destroy operation.
> 
> This ain't correct. It should have been something like
> virtio_crypto_destroy_session_input.
> 
Right, will fix it.

> > +
> > +
> > +\paragraph{Session operation}\label{sec:Device Types / Crypto Device /
> Device
> > +Operation / Control Virtqueue / Session operation}
> > +
> > +The session is a handle which describes the cryptographic parameters to be
> > +applied to a number of buffers.
> > +
> > +The following structure stores the result of session creation set by the
> device:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_session_input {
> > +/* Device write only portion */
> > +le64 session_id;
> > +le32 status;
> > +le32 padding;
> > +};
> > +\end{lstlisting}
> > +
> > +A request to destroy a session includes the following information:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_destroy_session_flf {
> > +/* Device read only portion */
> > +le64  session_id;
> > +/* Device write only portion */
> 
> This is the device writable portion and thus what we cal op_outcome above.
> So it should have been
> };
> 
> 
> struct virtio_crypto_destroy_session_input {
> > +le32  status;
> > +le32  padding;
> > +};
> 
> If we aren't consistent about it the dividing into parts (like op specific
> fixed and variable length (output) fields, operation outcome (input))
> isn't really helpful.
> 
It's ok to us, we can do it. Any other comments?

Thanks,
-Gonglei



Re: [Qemu-devel] About the light VM solution!

2017-12-06 Thread Gonglei (Arei)
> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Wednesday, December 06, 2017 11:10 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; Yang Zhong; Stefan Hajnoczi; qemu-devel
> Subject: Re: [Qemu-devel] About the light VM solution!
> 
> On Wed, Dec 06, 2017 at 09:21:55AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Qemu-devel
> > > [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> > > Behalf Of Stefan Hajnoczi
> > > Sent: Wednesday, December 06, 2017 12:31 AM
> > > To: Paolo Bonzini
> > > Cc: Yang Zhong; Stefan Hajnoczi; qemu-devel
> > > Subject: Re: [Qemu-devel] About the light VM solution!
> > >
> > > On Tue, Dec 05, 2017 at 03:00:10PM +0100, Paolo Bonzini wrote:
> > > > On 05/12/2017 14:47, Stefan Hajnoczi wrote:
> > > > > On Tue, Dec 5, 2017 at 1:35 PM, Paolo Bonzini <pbonz...@redhat.com>
> > > wrote:
> > > > >> On 05/12/2017 13:06, Stefan Hajnoczi wrote:
> > > > >>> On Tue, Dec 05, 2017 at 02:33:13PM +0800, Yang Zhong wrote:
> > > > >>>> As you know, AWS has decided to switch to KVM in their clouds. This
> > > news make almost all
> > > > >>>> china CSPs(clouds service provider) pay more attention on
> KVM/Qemu,
> > > especially light VM
> > > > >>>> solution.
> > > > >>>>
> > > > >>>> Below are intel solution for light VM, qemu-lite.
> > > > >>>>
> > >
> http://events.linuxfoundation.org/sites/events/files/slides/Light%20weight%2
> > > 0virtualization%20with%20QEMU%26KVM_0.pdf
> > > > >>>>
> > > > >>>> My question is whether community has some plan to implement
> light
> > > VM or alternative solutions? If no, whether our
> > > > >>>> qemu-lite solution is suitable for upstream again? Many thanks!
> > > > >>>
> > > > >>> What caused a lot of discussion and held back progress was the
> approach
> > > > >>> that was taken.  The basic philosophy seems to be bypassing or
> > > > >>> special-casing components in order to avoid slow operations.  This
> > > > >>> requires special QEMU, firmware, and/or guest kernel binaries and
> > > causes
> > > > >>> extra work for the management stack, distributions, and testers.
> > > > >>
> > > > >> I think having a special firmware (be it qboot or a special-purpose
> > > > >> SeaBIOS) is acceptable.
> > > > >
> > > > > The work Marc Mari Barcelo did in 2015 showed that SeaBIOS can boot
> > > > > guests quickly.  The guest kernel was entered in <35 milliseconds
> > > > > IIRC.  Why is special firmware necessary?
> > > >
> > > > I thought that wasn't the "conventional" SeaBIOS, but rather one with
> > > > reduced configuration options, but I may be remembering wrong.
> > >
> > > Marc didn't spend much time on optimizing SeaBIOS, he used the build
> > > options that were suggested.  An extra flag can be added in
> > > qemu_preinit() to skip slow init that's unnecessary on optimized
> > > machines.  That would allow a single SeaBIOS binary to run both full and
> > > lite systems.
> > >
> > What's options do you remember? Stefan. Or any links about that
> > thread? I'm Interesting with this topic.
> 
> Here is what I found:
> 
> Marc Mari's fastest SeaBIOS build took 8 ms from the first guest CPU
> instruction to entering the guest kernel.  CBFS was used instead of a
> normal boot device (e.g. virtio-blk).  Most hardware support was
> disabled.
> 
> https://mail.coreboot.org/pipermail/seabios/2015-July/009554.html
> 
> The SeaBIOS configuration file is here:
> 
> https://mail.coreboot.org/pipermail/seabios/2015-July/009548.html
> 
Thanks for your information. :)
 
Thanks,
-Gonglei



Re: [Qemu-devel] About the light VM solution!

2017-12-06 Thread Gonglei (Arei)

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On
> Behalf Of Stefan Hajnoczi
> Sent: Wednesday, December 06, 2017 12:31 AM
> To: Paolo Bonzini
> Cc: Yang Zhong; Stefan Hajnoczi; qemu-devel
> Subject: Re: [Qemu-devel] About the light VM solution!
> 
> On Tue, Dec 05, 2017 at 03:00:10PM +0100, Paolo Bonzini wrote:
> > On 05/12/2017 14:47, Stefan Hajnoczi wrote:
> > > On Tue, Dec 5, 2017 at 1:35 PM, Paolo Bonzini 
> wrote:
> > >> On 05/12/2017 13:06, Stefan Hajnoczi wrote:
> > >>> On Tue, Dec 05, 2017 at 02:33:13PM +0800, Yang Zhong wrote:
> >  As you know, AWS has decided to switch to KVM in their clouds. This
> news make almost all
> >  china CSPs(clouds service provider) pay more attention on KVM/Qemu,
> especially light VM
> >  solution.
> > 
> >  Below are intel solution for light VM, qemu-lite.
> > 
> http://events.linuxfoundation.org/sites/events/files/slides/Light%20weight%2
> 0virtualization%20with%20QEMU%26KVM_0.pdf
> > 
> >  My question is whether community has some plan to implement light
> VM or alternative solutions? If no, whether our
> >  qemu-lite solution is suitable for upstream again? Many thanks!
> > >>>
> > >>> What caused a lot of discussion and held back progress was the approach
> > >>> that was taken.  The basic philosophy seems to be bypassing or
> > >>> special-casing components in order to avoid slow operations.  This
> > >>> requires special QEMU, firmware, and/or guest kernel binaries and
> causes
> > >>> extra work for the management stack, distributions, and testers.
> > >>
> > >> I think having a special firmware (be it qboot or a special-purpose
> > >> SeaBIOS) is acceptable.
> > >
> > > The work Marc Mari Barcelo did in 2015 showed that SeaBIOS can boot
> > > guests quickly.  The guest kernel was entered in <35 milliseconds
> > > IIRC.  Why is special firmware necessary?
> >
> > I thought that wasn't the "conventional" SeaBIOS, but rather one with
> > reduced configuration options, but I may be remembering wrong.
> 
> Marc didn't spend much time on optimizing SeaBIOS, he used the build
> options that were suggested.  An extra flag can be added in
> qemu_preinit() to skip slow init that's unnecessary on optimized
> machines.  That would allow a single SeaBIOS binary to run both full and
> lite systems.
> 
What's options do you remember? Stefan. Or any links about that
thread? I'm Interesting with this topic.

Thanks,
-Gonglei



[Qemu-devel] 答复: [BUG] Windows 7 got stuck easily while run PCMark10 application

2017-12-01 Thread Gonglei (Arei)
I also think it's windows bug, the problem is that it doesn't occur on xen 
platform. And there are some other works need to be done while reading REG_C. 
So I wrote that patch.

Thanks,
Gonglei
发件人:Paolo Bonzini
收件人:龚磊,张海亮,qemu-devel,Michael S. Tsirkin
抄送:黄伟栋,王欣,谢祥有
时间:2017-12-02 01:10:08
主题:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application

On 01/12/2017 08:08, Gonglei (Arei) wrote:
> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
>CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> 
> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>CPU 1/KVM-15567 
> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because 
> cmos_index is 0x6 now:>CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size 
> 1 count 1 val 0x6> vcpu1 read 0x6:>CPU 1/KVM-15567 kvm_pio: pio_read 
> at 0x71 size 1 count 1 val 0x6
This seems to be a Windows bug.  The easiest workaround that I
can think of is to clear the interrupts already when 0xc is written,
without waiting for the read (because REG_C can only be read).

What do you think?

Thanks,

Paolo


Re: [Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application

2017-12-01 Thread Gonglei (Arei)
Pls see the trace of kvm_pio:

   CPU 1/KVM-15567 [003]  209311.762579: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762582: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x89
   CPU 1/KVM-15567 [003]  209311.762590: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x17
   CPU 0/KVM-15566 [005]  209311.762611: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0xc
   CPU 1/KVM-15567 [003]  209311.762615: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762619: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x88
   CPU 1/KVM-15567 [003]  209311.762627: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x12
   CPU 0/KVM-15566 [005]  209311.762632: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x12
   CPU 1/KVM-15567 [003]  209311.762633: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 0/KVM-15566 [005]  209311.762634: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0xc   <--- Firstly write to 0x70, cmo_index = 0xc & 
0x7f = 0xc
   CPU 1/KVM-15567 [003]  209311.762636: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x86   <-- Secondly write to 0x70, cmo_index = 0x86 & 
0x7f = 0x6, cover the cmo_index result of first time
   CPU 0/KVM-15566 [005]  209311.762641: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x6  <--  vcpu0 read 0x6 because cmo_index is 0x6 now
   CPU 1/KVM-15567 [003]  209311.762644: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x6 <-  vcpu1 read 0x6
   CPU 1/KVM-15567 [003]  209311.762649: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762669: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x87
   CPU 1/KVM-15567 [003]  209311.762678: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x1
   CPU 1/KVM-15567 [003]  209311.762683: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762686: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x84
   CPU 1/KVM-15567 [003]  209311.762693: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x10
   CPU 1/KVM-15567 [003]  209311.762699: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762702: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x82
   CPU 1/KVM-15567 [003]  209311.762709: kvm_pio: pio_read at 0x71 size 
1 count 1 val 0x25
   CPU 1/KVM-15567 [003]  209311.762714: kvm_pio: pio_read at 0x70 size 
1 count 1 val 0xff
   CPU 1/KVM-15567 [003]  209311.762717: kvm_pio: pio_write at 0x70 
size 1 count 1 val 0x80


Regards,
-Gonglei

From: Zhanghailiang
Sent: Friday, December 01, 2017 3:03 AM
To: qemu-devel@nongnu.org; m...@redhat.com; Paolo Bonzini
Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou
Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application

Hi,

We hit a bug in our test while run PCMark 10 in a windows 7 VM,
The VM got stuck and the wallclock was hang after several minutes running
PCMark 10 in it.
It is quite easily to reproduce the bug with the upstream KVM and Qemu.

We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
to
Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.

static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
  int irq_level, bool line_status)
{
… …
 if (!irq_level) {
  ioapic->irr &= ~mask;
  ret = 1;
  goto out;
 }
… …
 if ((edge && old_irr == ioapic->irr) ||
 (!edge && entry.fields.remote_irr)) {
  ret = 0;
  goto out;
 }

According to RTC spec, after RTC injects a High level irq, OS will read CMOS’s
register C to to clear the irq flag, and pull down the irq electric pin.

For Qemu, we will emulate the reading operation in cmos_ioport_read(),
but Guest OS will fire a write operation before to tell which register will be 
read
after this write, where we use s->cmos_index to record the following register 
to read.

But in our test, we found that there is a possible situation that Vcpu fails to 
read
RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
it changes s->cmos_index to RTC_YEAR by a writing action.
The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
will miss
calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
inject RTC irq,
and Windows VM will hang.
static void cmos_ioport_write(void *opaque, hwaddr addr,
 

Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Gonglei (Arei)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, November 30, 2017 12:39 AM
> To: Gonglei (Arei); Eric Blake; linzhecheng; qemu-devel@nongnu.org
> Cc: f...@redhat.com; wangxin (U)
> Subject: Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from
> creating thread to created thread
> 
> On 29/11/2017 17:28, Gonglei (Arei) wrote:
> >>> The root cause of this problem is a bug of glibc(version 2.17,the latest
> version
> >> has the same bug),
> >>> let's see what happened in glibc's code.
> >> Have you reported this bug to the glibc folks, and if so, can we include
> >> a URL to the glibc bugzilla?
> >>
> > No, we didn't do that yet. :(
> 
> It's here:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
> 
> I've added a note to the commit message.
> 
Nice~ :)

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Gonglei (Arei)


> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Thursday, November 30, 2017 12:19 AM
> To: linzhecheng; qemu-devel@nongnu.org
> Cc: aligu...@us.ibm.com; f...@redhat.com; wangxin (U); Gonglei (Arei);
> pbonz...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from
> creating thread to created thread
> 
> On 11/27/2017 10:46 PM, linzhecheng wrote:
> > If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may
> get a segfault in a low probability.
> >
> 
> >
> > The root cause of this problem is a bug of glibc(version 2.17,the latest 
> > version
> has the same bug),
> > let's see what happened in glibc's code.
> 
> Have you reported this bug to the glibc folks, and if so, can we include
> a URL to the glibc bugzilla?
> 
No, we didn't do that yet. :(


> Working around the glibc bug is nice, but glibc should really be fixed
> so that other projects do not have to continue working around it.
> 
> 
Yes, agree.


Regards,
-Gonglei

> >
> > QEMU get a segfault at line 50, becasue pd is an invalid address.
> > pd is still valid at line 38 when set pd->joinid = pd, at this moment,
> > created thread is just exiting(only keeps runing for a short time),
> 
> s/runing/running/
> 
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org


Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

2017-11-28 Thread Gonglei (Arei)


> -Original Message-
> From: rka...@virtuozzo.com [mailto:rka...@virtuozzo.com]
> Sent: Wednesday, November 29, 2017 1:56 PM
> To: Gonglei (Arei)
> Cc: Eduardo Habkost; Denis V. Lunev; longpeng; Michael S. Tsirkin; Denis
> Plotnikov; pbonz...@redhat.com; r...@twiddle.net; qemu-devel@nongnu.org;
> huangpeng; Zhaoshenglong
> Subject: Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default
> 
> On Wed, Nov 29, 2017 at 01:57:14AM +, Gonglei (Arei) wrote:
> > > On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote:
> > > > On 11/28/2017 10:58 PM, Eduardo Habkost wrote:
> > > > > On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote:
> > > > >> Commit 14c985cffa "target-i386: present virtual L3 cache info for
> vcpus"
> > > > >> introduced and set by default exposing l3 to the guest.
> > > > >>
> > > > >> The motivation behind it was that in the Linux scheduler, when waking
> up
> > > > >> a task on a sibling CPU, the task was put onto the target CPU's
> runqueue
> > > > >> directly, without sending a reschedule IPI.  Reduction in the IPI 
> > > > >> count
> > > > >> led to performance gain.
> > > > >>
> >
> > Yes, that's one thing.
> >
> > The other reason for enabling L3 cache is the performance of accessing
> memory.
> 
> I guess you're talking about the super-smart buffer size tuning glibc
> does in its memcpy and friends.  We try to control that with an atomic
> test for memcpy, and we didn't notice a difference.  We'll need to
> double-check...
> 
> > We tested it by Stream benchmark, the performance is better with
> L3-cache=on.
> 
> This one: https://www.cs.virginia.edu/stream/ ?  Thanks, we'll have a
> look, too.
> 
Yes. :)

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH 3/4] cryptodev-vhost-user: add crypto session handler

2017-11-28 Thread Gonglei (Arei)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, November 28, 2017 7:20 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: m...@redhat.com; Huangweidong (C); stefa...@redhat.com; Zhoujian
> (jay); pa...@linux.vnet.ibm.com; longpeng; xin.z...@intel.com;
> roy.fan.zh...@intel.com
> Subject: Re: [PATCH 3/4] cryptodev-vhost-user: add crypto session handler
> 
> On 28/11/2017 12:06, Gonglei (Arei) wrote:
> >>> You mean we can share control virtqueue to DPDK as well? Like data
> queues?
> >> I don't know :) but why not?
> >>
> > Current there are two main reasons for this design:
> >
> > 1) we should use another cpu to polling the control virtqueue, which is
> expensive.
> 
> IIRC DPDK also supports interrupt mode, doesn't it?  Is it possible to
> do interrupt mode for some virtqueues and poll mode for others?
> 

The intel guy Tan (Ccing) said to me:

" Interrupt mode for vhost-user is still not supported in current 
implementation. But we are evaluating the necessity now.

And yes, the mode (polling or interrupt) can be different for different 
queues."

> > 2) we should copy the logic of parsing control message to DPDK, which break
> >  current layered architecture .
> 
> But isn't it already a layering violation that you're adding *some*
> control messages to the vhost-user protocol?  I am not sure why only
> these two are necessary.
> 
Sorry, but I don't think this is layering violation, just like 
"vhost_net_set_mtu"
for vhost-net and "vhost_vsock_set_guest_cid_op" for vhost_vsock. They're all
device-specific messages. Aren't they?

Thanks,
-Gonglei

> Paolo
> 
> > I'm not sure if there are any other hidden issues for future scalability, 
> > such as
> > using Qemu to manage some control messages, avoiding D-Dos attack etc.
> >
> > Thanks,
> > -Gonglei



Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

2017-11-28 Thread Gonglei (Arei)

> -Original Message-
> From: Eduardo Habkost [mailto:ehabk...@redhat.com]
> Sent: Wednesday, November 29, 2017 5:13 AM
> To: Denis V. Lunev; longpeng; Michael S. Tsirkin
> Cc: Denis Plotnikov; pbonz...@redhat.com; r...@twiddle.net;
> qemu-devel@nongnu.org; rka...@virtuozzo.com; Gonglei (Arei); huangpeng;
> Zhaoshenglong; herongguang...@huawei.com
> Subject: Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default
> 
> [CCing the people who were copied in the original patch that
> enabled l3cache]
> 
Thanks for Ccing.

> On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote:
> > On 11/28/2017 10:58 PM, Eduardo Habkost wrote:
> > > Hi,
> > >
> > > On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote:
> > >> Commit 14c985cffa "target-i386: present virtual L3 cache info for vcpus"
> > >> introduced and set by default exposing l3 to the guest.
> > >>
> > >> The motivation behind it was that in the Linux scheduler, when waking up
> > >> a task on a sibling CPU, the task was put onto the target CPU's runqueue
> > >> directly, without sending a reschedule IPI.  Reduction in the IPI count
> > >> led to performance gain.
> > >>

Yes, that's one thing.

The other reason for enabling L3 cache is the performance of accessing memory.
We tested it by Stream benchmark, the performance is better with L3-cache=on.

> > >> However, this isn't the whole story.  Once the task is on the target
> > >> CPU's runqueue, it may have to preempt the current task on that CPU, be
> > >> it the idle task putting the CPU to sleep or just another running task.
> > >> For that a reschedule IPI will have to be issued, too.  Only when that
> > >> other CPU is running a normal task for too little time, the fairness
> > >> constraints will prevent the preemption and thus the IPI.
> > >>
> > >> This boils down to the improvement being only achievable in workloads
> > >> with many actively switching tasks.  We had no access to the
> > >> (proprietary?) SAP HANA benchmark the commit referred to, but the
> > >> pattern is also reproduced with "perf bench sched messaging -g 1"
> > >> on 1 socket, 8 cores vCPU topology, we see indeed:
> > >>
> > >> l3-cache #res IPI /s #time / 1 loops
> > >> off  560K1.8 sec
> > >> on   40K 0.9 sec
> > >>
> > >> Now there's a downside: with L3 cache the Linux scheduler is more eager
> > >> to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU
> > >> interactions and therefore exessive halts and IPIs.  E.g. "perf bench
> > >> sched pipe -i 10" gives
> > >>
> > >> l3-cache #res IPI /s #HLT /s #time /10 loops
> > >> off  200 (no K)  230 0.2 sec
> > >> on   400K330K0.5 sec
> > >>
> > >> In a more realistic test, we observe 15% degradation in VM density
> > >> (measured as the number of VMs, each running Drupal CMS serving 2 http
> > >> requests per second to its main page, with 95%-percentile response
> > >> latency under 100 ms) with l3-cache=on.
> > >>
> > >> We think that mostly-idle scenario is more common in cloud and personal
> > >> usage, and should be optimized for by default; users of highly loaded
> > >> VMs should be able to tune them up themselves.
> > >>

For currently public cloud providers, they usually provide different instances,
Including sharing instances and dedicated instances. 

And the public cloud tenants usually want the L3 cache, even bigger is better.

Basically all performance tuning target to specific scenarios, 
we only need to ensure benefit in most scenes.

Thanks,
-Gonglei

> > > There's one thing I don't understand in your test case: if you
> > > just found out that Linux will behave worse if it assumes that
> > > the VCPUs are sharing a L3 cache, why are you configuring a
> > > 8-core VCPU topology explicitly?
> > >
> > > Do you still see a difference in the numbers if you use "-smp 8"
> > > with no "cores" and "threads" options?
> > >
> > This is quite simple. A lot of software licenses are bound to the amount
> > of CPU __sockets__. Thus it is mandatory in a lot of cases to set topology
> > with 1 socket/xx cores to reduce the amount of money necessary to
> > be paid for the software.
> 
> In this case it looks like we're talking about the expected
> meaning of "cores=N".  My first interpretation would be that the
> user obviously want the guest to see the multiple cores sharing a
> L3 cache, because that's how real CPUs normally work.  But I see
> why you have different expectations.
> 
> Numbers on dedicated-pCPU scenarios would be helpful to guide the
> decision.  I wouldn't like to cause a performance regression for
> users that fine-tuned vCPU topology and set up CPU pinning.
> 
> --
> Eduardo



Re: [Qemu-devel] [PATCH 3/4] cryptodev-vhost-user: add crypto session handler

2017-11-28 Thread Gonglei (Arei)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, November 28, 2017 6:46 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: m...@redhat.com; Huangweidong (C); stefa...@redhat.com; Zhoujian
> (jay); pa...@linux.vnet.ibm.com; longpeng; xin.z...@intel.com;
> roy.fan.zh...@intel.com
> Subject: Re: [PATCH 3/4] cryptodev-vhost-user: add crypto session handler
> 
> On 28/11/2017 11:43, Gonglei (Arei) wrote:
> >> As far as I understand, VIRTIO_CRYPTO_CIPHER_CREATE_SESSION is called
> as
> >> a result of sending a message on the control virtqueue.
> >
> > VIRTIO_CRYPTO_CIPHER_CREATE_SESSION is a message type of control
> queue,
> > Means creating a session for next crypto requests.
> 
> Ok, so the message does have the same meaning as the control queue
> message.  Thanks for confirming.
> 
> >> Why can't vhost-user also process create/destroy session messages on the
> >> control virtqueue, instead of having device-specific messages in the
> >> protocol?
> >
> > You mean we can share control virtqueue to DPDK as well? Like data queues?
> 
> I don't know :) but why not?
> 
Current there are two main reasons for this design:

1) we should use another cpu to polling the control virtqueue, which is 
expensive.
2) we should copy the logic of parsing control message to DPDK, which break
 current layered architecture .

I'm not sure if there are any other hidden issues for future scalability, such 
as
using Qemu to manage some control messages, avoiding D-Dos attack etc.

Thanks,
-Gonglei


Re: [Qemu-devel] [PATCH 3/4] cryptodev-vhost-user: add crypto session handler

2017-11-28 Thread Gonglei (Arei)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, November 28, 2017 6:02 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: m...@redhat.com; Huangweidong (C); stefa...@redhat.com; Zhoujian
> (jay); pa...@linux.vnet.ibm.com; longpeng; xin.z...@intel.com;
> roy.fan.zh...@intel.com
> Subject: Re: [PATCH 3/4] cryptodev-vhost-user: add crypto session handler
> 
> On 28/11/2017 10:03, Gonglei wrote:
> > Introduce two vhost-user meassges:
> VHOST_USER_CREATE_CRYPTO_SESSION
> > and VHOST_USER_CLOSE_CRYPTO_SESSION. At this point, the QEMU side
> > support crypto operation in cryptodev host-user backend.
> >
> > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > Signed-off-by: Longpeng(Mike) <longpe...@huawei.com>
> > Signed-off-by: Zhoujian <jianjay.z...@huawei.com>
> > ---
> >  backends/cryptodev-vhost-user.c   | 50 +-
> >  docs/interop/vhost-user.txt   | 19 +
> >  hw/virtio/vhost-user.c| 89
> +++
> >  include/hw/virtio/vhost-backend.h |  8 
> >  4 files changed, 155 insertions(+), 11 deletions(-)
> 
> As far as I understand, VIRTIO_CRYPTO_CIPHER_CREATE_SESSION is called as
> a result of sending a message on the control virtqueue.
> 
VIRTIO_CRYPTO_CIPHER_CREATE_SESSION is a message type of control queue,
Means creating a session for next crypto requests.

> Why can't vhost-user also process create/destroy session messages on the
> control virtqueue, instead of having device-specific messages in the
> protocol?
> 
You mean we can share control virtqueue to DPDK as well? Like data queues?

Maybe I don't get your point. :(

Thanks,
-Gonglei


[Qemu-devel] [Questions] about the VHOST_MEMORY_MAX_NREGIONS of vhost-user backend?

2017-11-24 Thread Gonglei (Arei)
Hi,

Currently, the maximum number of supported memory regions for vhost-user 
backends is 8, 
and the maximum supported memory regions for vhost-net backends is determined 
by 
" /sys/module/vhost/parameters/max_mem_regions". 

In many scenarios, the vhost-user NIC will cause the memory region to become a 
bottleneck, reports
 "a used vhost backend has no free memory slots left". 

Such as memory hotplug (need to support multiple memory slots), 
and GPU pass-through (need to register multiple bar regions) and so on.

So, my questions are: Why definition vhost-user memory card memory region up to 
8?
Does it have any side effects if we increase the VHOST_MEMORY_MAX_NREGIONS? 
What about cross-version migration?

#define VHOST_MEMORY_MAX_NREGIONS8

static int vhost_user_memslots_limit(struct vhost_dev *dev)
{
return VHOST_MEMORY_MAX_NREGIONS;
}

Which is introduced by 
commit 5f6f6664bf24dc53f4bf98ba812d55ca93684cd5
Author: Nikolay Nikolaev 
Date:   Tue May 27 15:06:02 2014 +0300

Add vhost-user as a vhost backend.

The initialization takes a chardev backed by a unix domain socket.
It should implement qemu_fe_set_msgfds in order to be able to pass
file descriptors to the remote process.

Each ioctl request of vhost-kernel has a vhost-user message equivalent,
which is sent over the control socket.

The general approach is to copy the data from the supplied argument
pointer to a designated field in the message. If a file descriptor is
to be passed it will be placed in the fds array for inclusion in
the sendmsg control header.

VHOST_SET_MEM_TABLE ignores the supplied vhost_memory structure and scans
the global ram_list for ram blocks with a valid fd field set. This would
be set when the '-object memory-file' option with share=on property is used.

Signed-off-by: Antonios Motakis 
Signed-off-by: Nikolay Nikolaev 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 


Thanks,
-Gonglei





Re: [Qemu-devel] [BUG/RFC] INIT IPI lost when VM starts

2017-11-19 Thread Gonglei (Arei)
Hi Paolo,

What's your opinion about this patch? We found it just before finishing patches 
for the past two days.


Thanks,
-Gonglei


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Herongguang (Stephen)
> Sent: Thursday, April 06, 2017 9:47 AM
> To: Paolo Bonzini; rkrc...@redhat.com; afaer...@suse.de;
> jan.kis...@siemens.com; qemu-devel@nongnu.org; k...@vger.kernel.org;
> wangxin (U); Huangweidong (C)
> Subject: Re: [BUG/RFC] INIT IPI lost when VM starts
> 
> 
> 
> On 2017/4/6 0:16, Paolo Bonzini wrote:
> >
> > On 20/03/2017 15:21, Herongguang (Stephen) wrote:
> >> We encountered a problem that when a domain starts, seabios failed to
> >> online a vCPU.
> >>
> >> After investigation, we found that the reason is in kvm-kmod,
> >> KVM_APIC_INIT bit in
> >> vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
> >> INIT IPI sent
> >> to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
> >> command to qemu
> >> on VM start.
> >>
> >> In qemu, qmp_query_cpus-> cpu_synchronize_state->
> >> kvm_cpu_synchronize_state->
> >> do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
> >> kvm-kmod and
> >> sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
> >> kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
> >> pending_events is
> >> overwritten by qemu.
> >>
> >> I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
> >> after ‘query-cpus’,
> >> and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
> >> not sure whether
> >> it is OK for qemu to set cpu->kvm_vcpu_dirty in
> >> do_kvm_cpu_synchronize_state in each caller.
> >>
> >> What’s your opinion?
> > Hi Rongguang,
> >
> > sorry for the late response.
> >
> > Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear
> the
> > bit, but the result of the INIT is stored in mp_state.
> 
> It's dropped in KVM_SET_VCPU_EVENTS, see below.
> 
> >
> > kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
> > KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
> > it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
> > but I would like to understand the exact sequence of events.
> 
> time0:
> vcpu1:
> qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
>  > do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to
> true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in
> vcpu->arch.apic->pending_events was not set)
> 
> time1:
> vcpu0:
> send INIT-SIPI to all AP->(in vcpu 0's
> context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's
> arch.apic->pending_events is set)
> 
> time2:
> vcpu1:
> kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is
> true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten
> KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!)
> 
> So it's a race between vcpu1 get/put registers with kvm/other vcpus changing
> vcpu1's status/structure fields in the mean time, I am in worry of if there 
> are
> other fields may be overwritten,
> sipi_vector is one.
> 
> also see:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg438675.html
> 
> > Thanks,
> >
> > paolo
> >
> > .
> >
> 



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Gonglei (Arei)


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, November 16, 2017 4:55 PM
> To: longpeng; m...@redhat.com
> Cc: Longpeng(Mike); qemu-devel@nongnu.org; Gonglei (Arei); Wangjing (King,
> Euler); Huangweidong (C); stefa...@redhat.com
> Subject: Re: [Question] why need to start all queues in vhost_net_start
> 
> 
> 
> On 2017年11月16日 13:53, Longpeng (Mike) wrote:
> > On 2017/11/15 23:54, Longpeng(Mike) wrote:
> >> 2017-11-15 23:05 GMT+08:00 Jason Wang<jasow...@redhat.com>:
> >>> On 2017年11月15日 22:55, Longpeng(Mike) wrote:
> >>>> Hi guys,
> >>>>
> >>>> We got a BUG report from our testers yesterday, the testing scenario was
> >>>> migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
> >>>> queues*).
> >>>>
> >>>> We found the cause reason, and we'll report the BUG or send a fix patch
> >>>> to upstream if necessary( we haven't test the upstream yet, sorry... ).
> >>> Could you explain this a little bit more?
> >>>
> >>>> We want to know why the vhost_net_start() must start*total queues*
> ( in
> >>>> our
> >>>> VM there're 7 queues ) but not*the queues that current used*  ( in our
> VM,
> >>>> guest
> >>>> only uses the first 4 queues because it's limited by the number of vcpus)
> >>>> ?
> >>>>
> >>>> Looking forward to your help, thx:)
> >>> Since the codes have been there for years and works well for kernel
> >>> datapath. You should really explain what's wrong.
> >>>
> >> OK.:)
> >>
> >> In our scenario,  the Windows's virtio-net driver only use the first 4
> >> queues and it
> >> *only set desc/avail/used table for the first 4 queues*, so in QEMU
> >> the desc/avail/
> >> used of the last 3 queues are ZERO,  but unfortunately...
> >> '''
> >> vhost_net_start
> >>for (i = 0; i < total_queues; i++)
> >>  vhost_net_start_one
> >>vhost_dev_start
> >>  vhost_virtqueue_start
> >> '''
> >> In vhost_virtqueue_start(), it will calculate the HVA of
> >> desc/avail/used table, so for last
> >> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then
> >> send the results
> >> to the user-mode backend ( we use*vhost-user*  ) by
> vhost_virtqueue_set_addr().
> >>
> >> When the EVS get these address, it will update a*idx*  which will be
> >> treated as  vq's
> >> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).
> >>
> >> So we get the following result after virtio-net stop:
> >>the desc/avail/used of the last 3 queues's vqs are all ZERO, but these
> vqs's
> >>last_avail_idx is NOT ZERO.
> >>
> >> At last, virtio_load() reports an error:
> >> '''
> >>if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
> >> will be TRUE
> >>error_report("VQ %d address 0x0 "
> >>   "inconsistent with Host index 0x%x",
> >>   i, vdev->vq[i].last_avail_idx);
> >>  return -1;
> >> }
> >> '''
> >>
> >> BTW, the problem won't appear if use Linux guest, because the Linux
> virtio-net
> >> driver will set all 7 queues's desc/avail/used tables. And the problem
> >> won't appear
> >> if the VM use vhost-net, because vhost-net won't update*idx*  in
> SET_ADDR ioctl.
> 
> Just to make sure I understand here, I thought Windows guest + vhost_net
> hit this issue?
> 
No, Windows guest + vhost-user/DPDK.

BTW pls see virtio spec in :

"If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be 
used SHOULD be populated
with receive buffers."

It is not mandatory that all queues must be initialized.

Thanks,
-Gonglei



Re: [Qemu-devel] [virtio-dev] Re: [v21 1/2] virtio-crypto: Add virtio crypto device specification

2017-11-13 Thread Gonglei (Arei)
Hello Halil,

Thanks for your feedback. 

> 
> On 11/13/2017 08:17 AM, Gonglei (Arei) wrote:
> >>> +struct virtio_crypto_cipher_session_req {
> >>> +/* Device-readable part */
> >>> +struct virtio_crypto_cipher_session_para para;
> >>> +/* The cipher key */
> >>> +u8 cipher_key[keylen];
> >>> +
> >> Is there a limit to the size of chiper_key. I don't see one in your
> >> kernel code. OTOH given that virtio_crypto_sym_create_session_req
> >> is one flavor of virtio_crypto_op_ctrl_req.additional_para and that
> >> the later is 56 bytes in case no mux mode is supported, I think
> >> there must be a limit to the size of cipher_key!
> >>
> > Of course the size of cipher_key is limited, firstly the max length is 
> > defined
> > in virtio crypto's configuration, see
> >
> > struct virtio_crypto_config {
> >   ... ...
> > /* Maximum length of cipher key */
> > uint32_t max_cipher_key_len;
> >   ... ...
> > };
> >
> 
> So for the current qemu implementation it's 64 bytes.
> 
> > Secondly the real cipher_key size for a specific request, is in struct
> virtio_crypto_cipher_session_para,
> >
> > struct virtio_crypto_cipher_session_para {
> >... ...
> > /* length of key */
> > le32 keylen;
> >... ...
> > };
> >
> > That means a size of cipher_key is variable, which is assigned in each 
> > request.
> 
> Of course I understood that. There are two problems I was trying
> to point out, and you ignored both.
> 
> 1. The more important one I was explicit about. Sadly you ignored
> that part of my mail. I will mark it as *Problem 1* down below.
> 
> 2. If there is a limit to the size, then there should be a driver
> normative statement ("Driver Requirements") that states this limit
> MUST be respected. I didn't find this statement.

We can add it.

> >
> >> Please explain!
> >>
> >> Looking at the kernel code again, it seems to me that chiper_key
> >> starts at offset 72 == sizeof(struct virtio_crypto_op_ctrl_req)
> >> where struct virtio_crypto_op_ctrl_req is defined in
> >> include/uapi/linux/virtio_crypto.h. That would mean that this
> >> guy is *not a part of* virtio_crypto_op_ctrl_req but comes
> >> after it and is of variable size.
> 
> *Problem 1*
> 
> Now consider the part where the whole request is described
> 
> """
> +The controlq request is composed of two parts:
> +\begin{lstlisting}
> +struct virtio_crypto_op_ctrl_req {
> +struct virtio_crypto_ctrl_header header;
> +
> +/* additional paramenter */
> +u8 additional_para[addl_para_len];
> +};
> +\end{lstlisting}
> +
> +The first is a general header (see above). And the second one, additional
> +paramenter, contains an crypto-service-specific structure, which could be one
> +of the following types:
> +\begin{itemize*}
> +\item struct virtio_crypto_sym_create_session_req
> +\item struct virtio_crypto_hash_create_session_req
> +\item struct virtio_crypto_mac_create_session_req
> +\item struct virtio_crypto_aead_create_session_req
> +\item virtio_crypto_destroy_session_req
> +\end{itemize*}
> +
> +The size of the additional paramenter depends on the
> VIRTIO_CRYPTO_F_MUX_MODE
> +feature bit:
> +\item If the VIRTIO_CRYPTO_F_MUX_MODE feature bit is NOT negotiated,
> the
> +size of additional paramenter is fixed to 56 bytes, the data of the 
> unused
> +part (if has) will be ingored.
> +\item If the VIRTIO_CRYPTO_F_MUX_MODE feature bit is negotiated, the
> size of
> +additional paramenter is flexible, which is the same as the
> crypto-service-specific
> +structure used.
> """
> 
> There it's said that the whole request is header + additional_para and that
> if VIRTIO_CRYPTO_F_MUX_MODE feature bit is NOT negotiated additional
> para
> is 56 bytes. Let's assume the key is part of the additional parameter.
> But you can't put 64 bytes into 56 bytes. So as I say above
> *the key is not part of virtio_crypto_op_ctrl_req* neiter as described
> in this spec nor as defined in uapi/linux/virtio_crypto.h. That means
> the communication protocol description (more precisely the message format
> description) in the spec is broken. QED
> 
> In my opinion this is a big issue.
> 
OK, I get your point now. Sorry about that. :(

We should update the description about cipher_key and something like that.
The key is indeed not a part of virtio_crypto_op_ctrl_req in the realization, is
a separate entry in the descriptor table.

> 
> >

Re: [Qemu-devel] [virtio-dev] Re: [v21 1/2] virtio-crypto: Add virtio crypto device specification

2017-11-12 Thread Gonglei (Arei)
Hi,

>
> > +The controlq request is composed of two parts:
> > +\begin{lstlisting}
> > +struct virtio_crypto_op_ctrl_req {
> > +struct virtio_crypto_ctrl_header header;
> > +
> > +/* additional paramenter */
> > +u8 additional_para[addl_para_len];
> 
> What does additional paramenter mean? Even if I s/paramenter/parameter
> id doesn't sit well. To me and in this context additional is kind
> of like optional: because each member of a struct is trivially additional
> in respect to the previous members, and there is no point in pointing
> out additional. I would much rather go with something like:
> u8 op_specific[]
> 
> I also don't find the addl_para_len used anywhere. Then IMHO we don't
> need to introduce a name.
> 

I'd like to say that the additional_para[addl_para_len] is just a placeholder,
which is explained by the below content. I'm fine with op_specific[] too.

> > +};
> > +\end{lstlisting}
> > +
> > +The first is a general header (see above). And the second one, additional
> > +paramenter, contains an crypto-service-specific structure, which could be
> one
> 
> s/paramenter/parameter
> 
> It's actually opcode specific, or? Or is there a destroy service?
> 
We can choose the specific request (structure) as the *op_specific* according 
to opcode.

> > +of the following types:
> > +\begin{itemize*}
> > +\item struct virtio_crypto_sym_create_session_req
> > +\item struct virtio_crypto_hash_create_session_req
> > +\item struct virtio_crypto_mac_create_session_req
> > +\item struct virtio_crypto_aead_create_session_req
> > +\item virtio_crypto_destroy_session_req
> > +\end{itemize*}
> > +
> > +The size of the additional paramenter depends on the
> VIRTIO_CRYPTO_F_MUX_MODE
> 
> s/paramenter/parameter
> 
> > +feature bit:
> > +\item If the VIRTIO_CRYPTO_F_MUX_MODE feature bit is NOT negotiated,
> the
> > +size of additional paramenter is fixed to 56 bytes, the data of the
> unused
> 
> s/paramenter/parameter
> 
> > +part (if has) will be ingored.
> 
> s/ingored/ignored
> 
> 
> > +\item If the VIRTIO_CRYPTO_F_MUX_MODE feature bit is negotiated, the
> size of
> > +additional paramenter is flexible, which is the same as the
> crypto-service-specific
> 
> s/paramenter/parameter
> 
> > +structure used.
> > +
> > +\paragraph{Session operation}\label{sec:Device Types / Crypto Device /
> Device Operation / Control Virtqueue / Session operation}
> > +
> > +The session is a handle which describes the cryptographic parameters to be
> > +applied to a number of buffers.
> > +
> > +The following structure stores the result of session creation set by the
> device:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_session_input {
> > +/* Device-writable part */
> > +le64 session_id;
> > +le32 status;
> > +le32 padding;
> > +};
> > +\end{lstlisting}
> > +
> > +A request to destroy a session includes the following information:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_destroy_session_req {
> > +/* Device-readable part */
> > +le64  session_id;
> > +/* Device-writable part */
> > +le32  status;
> > +le32  padding;
> > +};
> > +\end{lstlisting}
> > +
> > +\subparagraph{Session operation: HASH session}\label{sec:Device Types /
> Crypto Device / Device
> > +Operation / Control Virtqueue / Session operation / Session operation: HASH
> session}
> > +
> 
> Let me skip to the one actually implemented.
> 
> > +
> > +The request of symmetric session includes two parts, CIPHER algorithms
> > +and chain algorithms (chaining CIPHER and HASH/MAC).
> 
> This sounds like concatenation and not either-or.
> > +
> > +CIPHER session requests are as follows:
> > +
> > +\begin{lstlisting}
> > +struct virtio_crypto_cipher_session_para {
> > +/* See VIRTIO_CRYPTO_CIPHER* above */
> > +le32 algo;
> > +/* length of key */
> > +le32 keylen;
> > +#define VIRTIO_CRYPTO_OP_ENCRYPT  1
> > +#define VIRTIO_CRYPTO_OP_DECRYPT  2
> > +/* encryption or decryption */
> > +le32 op;
> > +le32 padding;
> > +};
> > +
> > +struct virtio_crypto_cipher_session_req {
> > +/* Device-readable part */
> > +struct virtio_crypto_cipher_session_para para;
> > +/* The cipher key */
> > +u8 cipher_key[keylen];
> > +
> 
> Is there a limit to the size of chiper_key. I don't see one in your
> kernel code. OTOH given that virtio_crypto_sym_create_session_req
> is one flavor of virtio_crypto_op_ctrl_req.additional_para and that
> the later is 56 bytes in case no mux mode is supported, I think
> there must be a limit to the size of cipher_key!
> 
Of course the size of cipher_key is limited, firstly the max length is defined
in virtio crypto's configuration, see

struct virtio_crypto_config {
  ... ...
/* Maximum length of cipher key */
uint32_t max_cipher_key_len;
  ... ...
};

Secondly the real cipher_key size for a specific request, is in struct 
virtio_crypto_cipher_session_para,

struct virtio_crypto_cipher_session_para {
   ... ...
/* 

Re: [Qemu-devel] [v21 RESEND 0/2] virtio-crypto: virtio crypto device specification

2017-11-06 Thread Gonglei (Arei)
Hi guys,

What a long time iteration it is.

Hoping this is the final version if no big arguments exist as
discussed with Stefan at KVM Forum 2017 this October. People
can submit patches to fix some grammar issues or little problems, 
and then Xin' can submit the asymmetric crypto services spec
based on this version.

I'll start a vote for virtio crypto device after one week. :)

Please reviewing in time, thanks a lot!


Regards,
-Gonglei


> -Original Message-
> From: longpeng
> Sent: Monday, November 06, 2017 2:59 PM
> To: qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org
> Cc: Luonengjun; m...@redhat.com; cornelia.h...@de.ibm.com;
> stefa...@redhat.com; denglin...@chinamobile.com; Jani Kokkonen;
> ola.liljed...@arm.com; varun.se...@freescale.com; xin.z...@intel.com;
> brian.a.keat...@intel.com; liang.j...@intel.com; john.grif...@intel.com;
> Huangweidong (C); ag...@suse.de; jasow...@redhat.com;
> vincent.jar...@6wind.com; Gonglei (Arei); pa...@linux.vnet.ibm.com; wangxin
> (U); Zhoujian (jay); longpeng
> Subject: [v21 RESEND 0/2] virtio-crypto: virtio crypto device specification
> 
> This is the specification about the new virtio crypto device.
> 
> ---
> v21 -> v20
>  - rename 'queue_id' to 'reserved' [Halil]
>  - redescribe the format of the structures which using 'union'
>in the previous version [Halil]
> 
> v20 -> v19
>  - fix some typos and grammar fixes [Halil]
>  - make queue_id reserved [Halil]
>  - remove 'Steps of Operation'
> 
> v19 -> v18:
>  - fix some typos and grammar fixes [Stefan, Halil]
>  - rename VIRTIO_CRYPTO_F_STATELESS_MODE to
> VIRTIO_CRYPTO_F_MUX_MODE
>  - describe the VIRTIO_CRYPTO_STATUS in detial. [Halil]
>  - refactor and redescribe the controlq/dataq request's format
>of mux mode. [Halil]
>  - other small fixes. [Halil]
> 
> v18 -> v17:
>  - fix many English grammar problems suggested by Stefan, Thanks a lot!
> 
> v17 -> v16:
>  - Some grammar fixes [Stefan, Halil, Michael]
>  - add a section named "Supported crypto services" in order to explain bit
>numbers and valuse clearly. [Halil, Cornelia]
>  - avoid word reptition [Halil]
>  - rename non-session mode to stateless mode [Halil]
>  - change descriptions for all elements in struct virtio_crypto_config [Halil]
>  - add Halil as a reviewer in the ackonwledgement part, thanks for his work.
>  - other fixes here and there.
> 
> Changes since v15:
>  - use feature bits for non-session mode in order to keep compatibility with
>pre-existing code. [Halil & Michael]
>  - introduce VIRTIO_CRYPTO_F_ NON_SESSION_MODE feature bit to control all
> other
>non-session mode feature bits.
>  - fix some typos. [Stefan]
>  - introduce struct virtio_crypto_op_data_req_mux to support both session
>and non-session based crypto operations and keep compatibility with
>pre-existing code.
> 
> Changes since v14:
>  - drop VIRTIO_CRYPTO_S_STARTED status [Halil & Cornelia]
>  - correct a sentence about dataqueue and controlq in the first paragraph.
> [Halil]
>  - change a MAY to MUST about max_dataqueues. [Halil]
>  - add non-session mode support
>a) add four features for different crypto services to identify wheather
> support session mode.
>b) rewrite some
> 
> For pervious versions of virtio crypto spec, Pls see:
> 
> [v18]:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg444897.html
> 
> [v14]:
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02212.html
> 
> [v13]:
> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07348.html
> 
> For more information, please see:
>  http://qemu-project.org/Features/VirtioCrypto
> 
> ---
> Gonglei (2):
>   virtio-crypto: Add virtio crypto device specification
>   virtio-crypto: Add conformance clauses
> 
>  acknowledgements.tex |3 +
>  conformance.tex  |   29 +
>  content.tex  |2 +
>  virtio-crypto.tex| 1426
> ++
>  4 files changed, 1460 insertions(+)
>  create mode 100644 virtio-crypto.tex
> 
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [PATCH] crypto: afalg: fix a NULL pointer dereference

2017-11-06 Thread Gonglei (Arei)

> -Original Message-
> From: longpeng
> Sent: Monday, November 06, 2017 2:21 PM
> To: berra...@redhat.com; pbonz...@redhat.com; Gonglei (Arei)
> Cc: longpeng; qemu-devel@nongnu.org
> Subject: [PATCH] crypto: afalg: fix a NULL pointer dereference
> 
> Test-crypto-hash calls qcrypto_hash_bytesv/digest/base64 with
> errp=NULL, this will cause a NULL poniter deference if afalg_driver
> doesn't support requested algos:
> ret = qcrypto_hash_afalg_driver.hash_bytesv(alg, iov, niov,
> result, resultlen,
> errp);
> if (ret == 0) {
> return ret;
> }
> 
> error_free(*errp);  // <--- here
> 
> So we must check 'errp & *errp' before dereference.
> 
> Signed-off-by: Longpeng(Mike) <longpe...@huawei.com>
> ---

Reported-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Gonglei <arei.gong...@huawei.com>

Thanks,
-Gonglei

>  crypto/hash.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/crypto/hash.c b/crypto/hash.c
> index ac59c63..c464c78 100644
> --- a/crypto/hash.c
> +++ b/crypto/hash.c
> @@ -60,7 +60,9 @@ int qcrypto_hash_bytesv(QCryptoHashAlgorithm alg,
>   * TODO:
>   * Maybe we should treat some afalg errors as fatal
>   */
> -error_free(*errp);
> +if (errp && *errp) {
> +error_free(*errp);
> +}
>  #endif
> 
>  return qcrypto_hash_lib_driver.hash_bytesv(alg, iov, niov,
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [RFC 0/8] virtio-crypto: add multiplexing mode support

2017-10-09 Thread Gonglei (Arei)

> -Original Message-
> From: Halil Pasic [mailto:pa...@linux.vnet.ibm.com]
> Sent: Monday, October 09, 2017 7:05 PM
> 
> On 10/09/2017 11:22 AM, Gonglei (Arei) wrote:
> > The next patch refactors make sense to me,
> > but why do we need to decouple the virtio-crypto.h?
> >
> >
> 
> I wanted to be able to freely change the host side and test with an unchanged
> guest side, that's why I've done that. It's just for testing. I had to do that
> because we don't have a mux capable linux driver. Neither of these patches is
> intended for inclusion. I'm just trying to make a point with them: we can
> make this substantially simpler (compared to this RFC).
> 
I see.

> So how do we proceed here? It would be nice to see a cleaned up version of

Maybe Longpeng can apply your test patches in the following patch set when
he has time. @Longpeng

> this series soon. If I recall correctly there were also other things which
> can be done in a less convoluted manner.
> 
Oh? Which things?

> >> The basic idea behind the whole thing is that tinging about the requests 
> >> put
> >> on the virtqueues in terms of just complicates things unnecessarily.
> >>
> >> I could guess I will post the interesting part as a reply to this and the 
> >> less
> >> interesting part (decoupling) as an attachment. You are supposed to apply
> first
> >> the attachment then the part after the scissors line.
> >>
> >> Of course should you could respin the series preferably with the test
> >> included I can rebase my stuff.
> >>
> >> Please let me know about your opinion.
> >>
> > Thanks for your work, Halil. What's your opinion about virtio crypto spec 
> > v20?
> 
> I'm on it. I've already started witting on Friday but things turned out a bit 
> more
> interesting that expected. So I've postponed to today. Of course the two 
> things
> are
> connected. I will try to give some feedback today.
> 
Sounds good.

Thanks,
-Gonglei



  1   2   3   4   5   6   7   8   9   10   >