Nathan Lynch <nath...@linux.ibm.com> writes: > Haren Myneni <ha...@linux.ibm.com> writes: >>> Haren Myneni <ha...@linux.ibm.com> writes: >>>> The hypervisor returns migration failure if all VAS windows are not >>>> closed. During pre-migration stage, vas_migration_handler() sets >>>> migration_in_progress flag and closes all windows from the list. >>>> The allocate VAS window routine checks the migration flag, setup >>>> the window and then add it to the list. So there is possibility of >>>> the migration handler missing the window that is still in the >>>> process of setup. >>>> >>>> t1: Allocate and open VAS t2: Migration event >>>> window >>>> >>>> lock vas_pseries_mutex >>>> If migration_in_progress set >>>> unlock vas_pseries_mutex >>>> return >>>> open window HCALL >>>> unlock vas_pseries_mutex >>>> Modify window HCALL lock vas_pseries_mutex >>>> setup window migration_in_progress=true >>>> Closes all windows from >>>> the list >>>> unlock vas_pseries_mutex >>>> lock vas_pseries_mutex return >>>> if nr_closed_windows == 0 >>>> // No DLPAR CPU or migration >>>> add to the list >>>> unlock vas_pseries_mutex >>>> return >>>> unlock vas_pseries_mutex >>>> Close VAS window >>>> // due to DLPAR CPU or migration >>>> return -EBUSY >>> >>> Could the the path t1 takes simply hold the mutex for the duration of >>> its execution instead of dropping and reacquiring it in the middle? >>> >>> Here's the relevant code from vas_allocate_window(): >>> >>> mutex_lock(&vas_pseries_mutex); >>> if (migration_in_progress) >>> rc = -EBUSY; >>> else >>> rc = allocate_setup_window(txwin, (u64 *)&domain[0], >>> cop_feat_caps->win_type); >>> mutex_unlock(&vas_pseries_mutex); >>> if (rc) >>> goto out; >>> >>> rc = h_modify_vas_window(txwin); >>> if (!rc) >>> rc = get_vas_user_win_ref(&txwin->vas_win.task_ref); >>> if (rc) >>> goto out_free; >>> >>> txwin->win_type = cop_feat_caps->win_type; >>> mutex_lock(&vas_pseries_mutex); >>> if (!caps->nr_close_wins) { >>> list_add(&txwin->win_list, &caps->list); >>> caps->nr_open_windows++; >>> mutex_unlock(&vas_pseries_mutex); >>> vas_user_win_add_mm_context(&txwin->vas_win.task_ref); >>> return &txwin->vas_win; >>> } >>> mutex_unlock(&vas_pseries_mutex); >>> >>> Is there something about h_modify_vas_window() or get_vas_user_win_ref() >>> that requires temporarily dropping the lock? >>> >> >> Thanks Nathan for your comments. >> >> vas_pseries_mutex protects window ID and IRQ allocation between alloc >> and free window HCALLs, and window list. Generally try to not using >> mutex in HCALLs, but we need this mutex with these HCALLs. >> >> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the >> mutex context, but not needed. > > Hmm. I contend that it would fix your bug in a simpler way that > eliminates the race instead of coping with it by adding more state and > complicating the locking model. With your change, readers of the > migration_in_progress flag check it under the mutex, but the writer > updates it outside of the mutex, which seems strange and unlikely to be > correct.
Expanding on this, with your change, migration_in_progress becomes a boolean atomic_t flag accessed only with atomic_set() and atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt says: Non-RMW ops: The non-RMW ops are (typically) regular LOADs and STOREs and are canonically implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and smp_store_release() respectively. Therefore, if you find yourself only using the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all and are doing it wrong. So making migration_in_progress an atomic_t does not confer any advantageous properties to it that it lacks as a plain boolean. Considering also (from the same document): - non-RMW operations are unordered; - RMW operations that have no return value are unordered; I am concerned that threads executing these segments of code will not always observe each others' effects in the intended order: // vas_allocate_window() mutex_lock(&vas_pseries_mutex); if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) { list_add(&txwin->win_list, &caps->list); caps->nr_open_windows++; atomic_dec(&caps->nr_open_wins_progress); mutex_unlock(&vas_pseries_mutex); vas_user_win_add_mm_context(&txwin->vas_win.task_ref); return &txwin->vas_win; } mutex_unlock(&vas_pseries_mutex); ... atomic_dec(&caps->nr_open_wins_progress); wake_up(&open_win_progress_wq); // vas_migration_handler() atomic_set(&migration_in_progress, 1); ... mutex_lock(&vas_pseries_mutex); rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows, true); mutex_unlock(&vas_pseries_mutex); /* * Windows are included in the list after successful * open. So wait for closing these in-progress open * windows in vas_allocate_window() which will be * done if the migration_in_progress is set. */ rc = wait_event_interruptible(open_win_progress_wq, !atomic_read(&vcaps->nr_open_wins_progress)); Maybe it's OK in practice for some reason? I'm finding it difficult to reason about :-)