Re: [PATCH v3 1/1] accel/kvm: Fix segmentation fault

2024-05-06 Thread Zhijian Li (Fujitsu)
on 5/7/2024 10:50 AM, Masato Imai wrote: > When the KVM acceleration parameter is not set, executing calc_dirty_rate > with the -r or -b option results in a segmentation fault due to accessing > a null kvm_state pointer in the kvm_dirty_ring_enabled function. This > commit adds a null check for

Re: [PATCH v2] hw/mem/cxl_type3: reset dvsecs in ct3d_reset()

2024-04-25 Thread Zhijian Li (Fujitsu)
ping On 11/04/2024 18:18, Jonathan Cameron wrote: > On Tue, 9 Apr 2024 15:58:46 +0800 > Li Zhijian wrote: > >> After the kernel commit >> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >> match a CFMWS window") >> CXL type3 devices cannot be enabled again after the

Re: [PATCH v2 1/1] migration/dirtyrate: Fix segmentation fault

2024-04-24 Thread Zhijian Li (Fujitsu)
On 24/04/2024 12:52, mii wrote: > > On 2024/04/24 10:28, Yong Huang wrote: >> >> >> On Tue, Apr 23, 2024 at 9:35 PM Peter Xu wrote: >> >> On Tue, Apr 23, 2024 at 09:13:08AM +, Masato Imai wrote: >> > When the KVM acceleration parameter is not set, executing >> calc_dirty_rate >>

Re: [PATCH v2] migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.

2024-04-17 Thread Zhijian Li (Fujitsu)
On 17/04/2024 14:13, Philippe Mathieu-Daudé wrote: > On 17/4/24 04:56, Li Zhijian via wrote: >> bdrv_activate_all() should not be called from the coroutine context, move >> it to the QEMU thread colo_process_incoming_thread() with the bql_lock >> protected. >> >> The backtrace is as follows: >>  

Re: [PATCH] migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.

2024-04-16 Thread Zhijian Li (Fujitsu)
On 17/04/2024 10:44, Li Zhijian wrote: > bdrv_activate_all() should not be called from the coroutine context, move > it to the QEMU thread colo_process_incoming_thread() with the bql_lock > protected. > > The backtrace is as follows: > #4 0x561af7948362 in bdrv_graph_rdlock_main_loop ()

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-09 Thread Zhijian Li (Fujitsu)
on 4/10/2024 3:46 AM, Peter Xu wrote: >> Is there document/link about the unittest/CI for migration tests, Why >> are those tests missing? >> Is it hard or very special to set up an environment for that? maybe we >> can help in this regards. > See tests/qtest/migration-test.c. We put most of

Re: [PATCH 2/2] CXL/cxl_type3: reset DVSEC CXL Control in ct3d_reset

2024-04-03 Thread Zhijian Li (Fujitsu)
On 03/04/2024 11:42, Li Zhijian wrote: > > > On 02/04/2024 17:17, Jonathan Cameron wrote: >> On Tue,  2 Apr 2024 09:46:47 +0800 >> Li Zhijian wrote: >> >>> After the kernel commit >>> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >>> match a CFMWS window") >> >>

Re: [PATCH 2/2] CXL/cxl_type3: reset DVSEC CXL Control in ct3d_reset

2024-04-02 Thread Zhijian Li (Fujitsu)
On 02/04/2024 17:17, Jonathan Cameron wrote: > On Tue, 2 Apr 2024 09:46:47 +0800 > Li Zhijian wrote: > >> After the kernel commit >> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >> match a CFMWS window") > > Fixes tag seems appropriate. > >> CXL type3 devices

Re: [PATCH 1/2] CXL/cxl_type3: add first_dvsec_offset() helper

2024-04-01 Thread Zhijian Li (Fujitsu)
On 02/04/2024 12:09, fan wrote: > On Tue, Apr 02, 2024 at 09:46:46AM +0800, Li Zhijian via wrote: >> It helps to figure out where the first dvsec register is located. In >> addition, replace offset and size hardcore with existing macros. >> >> Signed-off-by: Li Zhijian >> --- >>

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-01 Thread Zhijian Li (Fujitsu)
Phil, on 3/29/2024 6:28 PM, Philippe Mathieu-Daudé wrote: >> >> >>> IMHO it's more important to know whether there are still users and >>> whether >>> they would still like to see it around. >> >> Agree. >> I didn't immediately express my opinion in V1 because I'm also >> consulting our >>

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-28 Thread Zhijian Li (Fujitsu)
On 28/03/2024 23:01, Peter Xu wrote: > On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote: >> Philippe Mathieu-Daudé writes: >> >>> The whole RDMA subsystem was deprecated in commit e9a54265f5 >>> ("hw/rdma: Deprecate the pvrdma device and the rdma subsystem") >>> released in v8.2.

Re: Problem with migration/rdma

2024-03-07 Thread Zhijian Li (Fujitsu)
On 08/03/2024 14:55, Peter Xu wrote: > On Fri, Mar 08, 2024 at 07:27:59AM +0100, Yu Zhang wrote: >> Hello Zhijian and Peter, >> >> Thank you so much for testing and confirming it. >> I created a patch in the email format, unfortunately got an issue for >> setting up the >> "Application-specific

Re: Problem with migration/rdma

2024-03-06 Thread Zhijian Li (Fujitsu)
Yu, On 07/03/2024 00:30, Philippe Mathieu-Daudé wrote: > Cc'ing RDMA migration reviewers/maintainers: > > $ ./scripts/get_maintainer.pl -f migration/rdma.c > Li Zhijian (reviewer:RDMA Migration) > Peter Xu (maintainer:Migration) > Fabiano Rosas (maintainer:Migration) > > On 5/3/24 22:32, Yu

Re: [PATCH v2 1/2] hw/cxl: Pass CXLComponentState to cache_mem_ops

2023-10-23 Thread Zhijian Li (Fujitsu)
On 19/10/2023 18:50, Jonathan Cameron wrote: > On Wed, 18 Oct 2023 16:24:07 +0800 > Li Zhijian wrote: > >> cache_mem_ops.{read,write}() interprets opaque as >> CXLComponentState(cxl_cstate) instead of ComponentRegisters(cregs). >> >> Fortunately, cregs is the first member of cxl_cstate, so

Re: [PATCH] hw/cxl: Fix opaque type interpret wrongly

2023-10-18 Thread Zhijian Li (Fujitsu)
On 13/10/2023 16:52, Philippe Mathieu-Daudé wrote: > On 13/10/23 03:55, Li Zhijian wrote: >> void cxl_component_register_block_init(Object *obj, >>     CXLComponentState *cxl_cstate, >>     const char *type) >> { >>

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-10-16 Thread Zhijian Li (Fujitsu)
On 16/10/2023 20:11, Jason Gunthorpe wrote: > On Sat, Oct 07, 2023 at 03:40:50AM +0000, Zhijian Li (Fujitsu) wrote: >> +rdma-core >> >> >> Is global variable *errno* reliable when the documentation only states >> "returns 0 on success, or the valu

Re: [PATCH v3 13/13] migration/rdma: Remove all "ret" variables that are used only once

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Change code that is: > > int ret; > ... > > ret = foo(); > if (ret[ < 0]?) { > > to: > > if (foo()[ < 0]) { > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian

Re: [PATCH v3 02/13] migration/rdma: Unfold ram_control_before_iterate()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there: > - Remove unused data parameter > - unfold it in its callers. > - change all callers to call qemu_rdma_registration_start() > - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() > > Reviewed-by: Peter Xu >

Re: [PATCH v3 12/13] migration/rdma: Declare for index variables local

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Declare all variables that are only used inside a for loop inside the > for statement. > > This makes clear that they are not used outside of the for loop. > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian

Re: [PATCH v3 11/13] migration/rdma: Use i as for index instead of idx

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there, all the uses are local to the for, so declare the variable > inside the for statement. > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.c | 49 ++-- > 1 file

Re: [PATCH v3 05/13] migration/rdma: Unfold hook_ram_load()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > There is only one flag called with: RAM_CONTROL_BLOCK_REG. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 11 --- > migration/rdma.h | 3 +++ >

Re: [PATCH v3 09/13] migration/rdma: Remove qemu_ prefix from exported functions

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Functions are long enough even without this. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.h | 12 ++-- > migration/ram.c| 14 +++--- > migration/rdma.c

Re: [PATCH v3 06/13] migration/rdma: Create rdma_control_save_page()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > The only user of ram_control_save_page() and save_page() hook was > rdma. Just move the function to rdma.c, rename it to > rdma_control_save_page(). > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela [...] > > +int

Re: [PATCH v3 08/13] migration/rdma: Move rdma constants from qemu-file.h to rdma.h

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 17 - > migration/rdma.h | 16 > migration/ram.c | 2 +- > 3 files changed, 17

Re: [PATCH v3 03/13] migration/rdma: Unfold ram_control_after_iterate()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there: > - Remove unused data parameter > - unfold it in its callers > - change all callers to call qemu_rdma_registration_stop() > - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() > > Reviewed-by: Peter Xu > Signed-off-by:

Re: [PATCH v3 07/13] qemu-file: Remove QEMUFileHooks

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > The only user was rdma, and its use is gone. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 4 > migration/qemu-file.c | 6 -- > migration/rdma.c | 9 - >

Re: [PATCH v3 04/13] migration/rdma: Remove all uses of RAM_CONTROL_HOOK

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Instead of going trhough ram_control_load_hook(), call > qemu_rdma_registration_handle() directly. > s/trhough/through Reviewed-by: Li Zhijian > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela > --- > migration/qemu-file.h | 1 - >

Re: [PATCH v3 10/13] migration/rdma: Check sooner if we are in postcopy for save_page()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.c | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index

Re: [PATCH v3 01/13] migration: Create migrate_rdma()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Helper to say if we are doing a migration over rdma. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/migration.h | 2 ++ > migration/options.h | 1 + > migration/migration.c | 1 + >

Re: [PATCH 1/1] migration: Non multifd migration don't care about multifd flushes

2023-10-11 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:55, Juan Quintela wrote: > RDMA was having trouble because > migrate_multifd_flush_after_each_section() can only be true or false, > but we don't want to send any flush when we are not in multifd > migration. > > CC: Fabiano Rosas Fixes:

Re: [PATCH v2 1/2] migration: Fix rdma migration failed

2023-10-07 Thread Zhijian Li (Fujitsu)
On 04/10/2023 02:57, Juan Quintela wrote: > commit c638f66121ce30063fbf68c3eab4d7429cf2b209 > Author: Juan Quintela > Date: Tue Oct 3 20:53:38 2023 +0200 > > migration: Non multifd migration don't care about multifd flushes > > RDMA was having trouble because >

Re: [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qemu_rdma_data_init() neglects to set an Error when it fails because > @host_port is null. Fortunately, no caller passes null, so this is > merely a latent bug. Drop the flawed code handling null argument. > > Signed-off-by: Markus Armbruster

Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > We use errno after calling Libibverbs functions that are not > documented to set errno (manual page does not mention errno), or where > the documentation is unclear ("returns [...] the value of errno on > failure"). While this could be read as

Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > error_report() obeys -msg, reports the current error location if any, > and reports to the current monitor if any. Reporting to stderr > directly with fprintf() or perror() is wrong, because it loses all > this. > > Fix the offenders. Bonus:

Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-10-06 Thread Zhijian Li (Fujitsu)
+rdma-core Is global variable *errno* reliable when the documentation only states "returns 0 on success, or the value of errno on failure (which indicates the failure reason)." Someone read it as "assign error code to errno and return it.", I used to think the same way. but ibv_post_send()

Re: [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qemu_rdma_exchange_get_response() compares int parameter @expecting > with uint32_t head->type. Actual arguments are non-negative > enumeration constants, RDMAControlHeader uint32_t member type, or > qemu_rdma_exchange_recv() int parameter

Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill() > to an int variable before it adds it to @done / subtracts it from > @want, both size_t. Truncation when qemu_rdma_fill() copies more than > INT_MAX bytes. Seems vanishingly

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 26/09/2023 19:52, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:42, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-26 Thread Zhijian Li (Fujitsu)
On 25/09/2023 15:09, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> All we do with the value of RDMAContext member @error_state is test >>> whether it's zero. Change to bool and

Re: [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > This is just to make the error value more obvious. Callers don't > mind. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several functions return negative errno codes on failure. Callers > check for specific codes exactly never. For some of the functions, > callers couldn't check even if they wanted to, because the functions > also return negative values that

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH] MAINTAINERS: Add entry for rdma migration

2023-09-26 Thread Zhijian Li (Fujitsu)
jian can see the patches and review when he still has > the bandwidth. Feel free to add my Acked tag. thanks. Acked-by: Li Zhijian > > Cc: Daniel P. Berrangé > Cc: Juan Quintela > Cc: Markus Armbruster > Cc: Zhijian Li (Fujitsu) > Cc: Fabiano Rosas > Signed-off-by:

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-26 Thread Zhijian Li (Fujitsu)
On 26/09/2023 14:41, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because

Re: [PATCH 52/52] migration/rdma: Fix how we show device details on open

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > if (local->nb_blocks != nb_dest_blocks) { > -fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) > " > -"Your QEMU command line parameters are probably " > -"not identical

Re: [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 47/52] migration/rdma: Don't report received completion events as error

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > When qemu_rdma_wait_comp_channel() receives an event from the > completion channel, it reports an error "receive cm event while wait > comp channel,cm event is T", where T is the numeric event type. > However, the function fails only when T is a

Re: [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 26/09/2023 13:50, Li Zhijian wrote: > > > On 18/09/2023 22:41, Markus Armbruster wrote: >> Functions that use an Error **errp parameter to return errors should >> not also report them to the user, because reporting is the caller's >> job.  When the caller does, the error is reported twice. 

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for symmetry with qemu_rdma_post_send_control(). Error messages > lose detail I consider of no use to users. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for consistency with qemu_rdma_write_one() and > qemu_rdma_write_flush(), and for slightly simpler code. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:42, Peter Xu wrote: > On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote: >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:21, Peter Xu wrote: > On Thu, Sep 21, 2023 at 08:27:24AM +0000, Zhijian Li (Fujitsu) wrote: >> I'm worried that I may not have enough time, ability, or environment to >> review/test >> the RDMA patches. but for this patch set, i will take a look later. &g

Re: [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > These guards are all redundant now. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 164 +++ > 1 file changed, 51 insertions(+), 113 deletions(-) > >

Re: [PATCH 31/52] migration/rdma: Retire macro ERROR()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > ERROR() has become "error_setg() unless an error has been set > already". Hiding the conditional in the macro is in the way of > further work. Replace the macro uses by their expansion, and delete > the macro. > > Signed-off-by: Markus

Re: [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over > addresses to find one that works, holding onto the first Error from > qemu_rdma_broken_ipv6_kernel() for use when no address works. Issues: > > 1. If @errp was _abort or _fatal,

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-25 Thread Zhijian Li (Fujitsu)
On 25/09/2023 14:36, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> The QEMUFileHooks methods don't come with a written contract. Digging >>> through the code calling them, we fin

Re: [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When migration capability @rdma-pin-all is true, but the server cannot > honor it, qemu_rdma_connect() calls macro ERROR(), then returns > success. > > ERROR() sets an error. Since qemu_rdma_connect() returns success, its > caller

Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When a function returns 0 on success, negative value on error, > checking for non-zero suffices, but checking for negative is clearer. > So do that. > This patch is no my favor convention. @Peter, Juan I'd like to hear your opinions. Thanks

Re: [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 35 ++- > 1 file changed, 10 insertions(+), 25 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > All we do with the value of RDMAContext member @error_state is test > whether it's zero. Change to bool and rename to @errored. > make sense! Reviewed-by: Li Zhijian Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up

Re: [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or > rdma->error_state on failure. Callers actually expect a negative > error value. I don't see the only one callers expect a negative error code. migration/rdma.c:1654:

Re: [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_getaddrinfo() returns 0 on success. On error, it returns one of > the EAI_ error codes like getaddrinfo() does, Good catch, It used to be -1 on error, rdma_getaddrinfo(3) updated it 2021. or -1 with errno set. > This is broken by

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > The QEMUFileHooks methods don't come with a written contract. Digging > through the code calling them, we find: > > * save_page(): I'm fine with this > >Negative values RAM_SAVE_CONTROL_DELAYED and >RAM_SAVE_CONTROL_NOT_SUPP are

Re: [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_get_cm_event_timeout() neglects to set an error when it fails > because rdma_get_cm_event() fails. Harmless, as its caller > qemu_rdma_connect() substitutes a generic error then. Fix it anyway. > > qemu_rdma_connect() also sets the generic

Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until > they find on that works. If none works, they return the first Error > set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one. > > qemu_rdma_broken_ipv6_kernel()

Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Hiding return statements in macros is a bad idea. Use a function > instead, and open code the return part. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 43

Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > QIOChannelClass methods qio_channel_rdma_readv() and > qio_channel_rdma_writev() violate their method contract when > rdma->error_state is non-zero: > > 1. They return whatever is in rdma->error_state then. Only -1 will be > fine. -2 will be

Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several error messages include numeric error codes returned by failed > functions: > > * ibv_poll_cq() returns an unspecified negative value. Useless. > > * rdma_accept and rmda_get_cm_event() return -1. Useless.

Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > @error_reported and @received_error are flags. The latter is even > assigned bool true. Change them from int to bool. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 6 +++--- > 1 file changed, 3

Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_buffer_mergable() is semantically a predicate. It returns > int 0 or 1. Return bool instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_search_ram_block() can't fail. Return void, and drop the > unreachable error handling. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 21/09/2023 19:15, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> rdma_add_block() can't fail. Return void, and drop the unreachable >>> error handling. >>> >>> Signed

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-22 Thread Zhijian Li (Fujitsu)
On 20/09/2023 20:46, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA has its own control >> message,

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_add_block() can't fail. Return void, and drop the unreachable > error handling. > > Signed-off-by: Markus Armbruster > --- > migration/rdma.c | 30 +- > 1 file changed, 9 insertions(+), 21 deletions(-) >

Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When all we do with an Error we receive into a local variable is > propagating to somewhere else, we can just as well receive it there > right away. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 19

Re: [PATCH 09/52] migration/rdma: Put @errp parameter last

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > include/qapi/error.h demands: > > * - Functions that use Error to report errors have an Error **errp > * parameter. It should be the last parameter, except for functions > * taking variable arguments. > > qemu_rdma_connect() does not

Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_accept() returns 0 in some cases even when it didn't > complete its job due to errors. Impact is not obvious. I figure the > caller will soon fail again with a misleading error message. > > Fix it to return -1 on any failure. > >

Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > wrid_desc[] uses 4001 pointers to map four integer values to strings. > > print_wrid() accesses wrid_desc[] out of bounds when passed a negative > argument. It returns null for values 2..1999 and 2001..3999. > > qemu_rdma_poll() and

Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_delete_block() always returns 0, which its only caller ignores. > Return void instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_data_init() return type is void *. It actually returns > RDMAContext *, and all its callers assign the value to an > RDMAContext *. Unclean. > > Return RDMAContext * instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li

Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_poll()'s return type is uint64_t, even though it returns 0, > -1, or @ret, which is int. Its callers assign the return value to int > variables, then check whether it's negative. Unclean. > > Return int instead. > > Signed-off-by:

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-21 Thread Zhijian Li (Fujitsu)
Perter, On 20/09/2023 00:49, Peter Xu wrote: > On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote: >> Oh dear, where to start. There's so much wrong, and in pretty obvious >> ways. This code should never have passed review. I'm refraining from >> saying more; see the commit

  1   2   >