On Mon, Jul 13, 2009 at 10:02:13AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Mon, Jul 13, 2009 at 09:40:01AM -0400, Gregory Haskins wrote:
> >
> >> Gleb Natapov wrote:
> >>
> >>> On Mon, Jul 13, 2009 at 09:26:21AM -0400, Gregory Haskins wrote:
> >>>
> >>>
> >>>> Gleb Natapov wrote:
> >>>>
> >>>>
> >>>>> On Mon, Jul 13, 2009 at 04:02:56PM +0300, Michael S. Tsirkin wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Sun, Jul 12, 2009 at 03:03:53PM +0300, Gleb Natapov wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Use RCU locking for mask/ack notifiers lists.
> >>>>>>>
> >>>>>>> Signed-off-by: Gleb Natapov <[email protected]>
> >>>>>>> ---
> >>>>>>> virt/kvm/irq_comm.c | 20 +++++++++++---------
> >>>>>>> 1 files changed, 11 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> >>>>>>> index 5dde1ef..ba3a115 100644
> >>>>>>> --- a/virt/kvm/irq_comm.c
> >>>>>>> +++ b/virt/kvm/irq_comm.c
> >>>>>>> @@ -179,18 +179,18 @@ void kvm_notify_acked_irq(struct kvm *kvm,
> >>>>>>> unsigned irqchip, unsigned pin)
> >>>>>>> break;
> >>>>>>> }
> >>>>>>> }
> >>>>>>> - rcu_read_unlock();
> >>>>>>>
> >>>>>>> - hlist_for_each_entry(kian, n, &kvm->irq_ack_notifier_list, link)
> >>>>>>> + hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> >>>>>>> link)
> >>>>>>> if (kian->gsi == gsi)
> >>>>>>> kian->irq_acked(kian);
> >>>>>>> + rcu_read_unlock();
> >>>>>>> }
> >>>>>>>
> >>>>>>> void kvm_register_irq_ack_notifier(struct kvm *kvm,
> >>>>>>> struct kvm_irq_ack_notifier *kian)
> >>>>>>> {
> >>>>>>> mutex_lock(&kvm->irq_lock);
> >>>>>>> - hlist_add_head(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>> + hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>> mutex_unlock(&kvm->irq_lock);
> >>>>>>> }
> >>>>>>>
> >>>>>>> @@ -198,8 +198,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm
> >>>>>>> *kvm,
> >>>>>>> struct kvm_irq_ack_notifier *kian)
> >>>>>>> {
> >>>>>>> mutex_lock(&kvm->irq_lock);
> >>>>>>> - hlist_del_init(&kian->link);
> >>>>>>> + hlist_del_init_rcu(&kian->link);
> >>>>>>> mutex_unlock(&kvm->irq_lock);
> >>>>>>> + synchronize_rcu();
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> This is done under kvm->lock still, which means the lock might be held
> >>>>>> potentially for a very long time. Can synchronize_rcu be moved out of
> >>>>>> this lock?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> Only if kvm_free_assigned_device() will be moved out of this lock.
> >>>>> Device de-assignment is not very frequent event though. How long do you
> >>>>> think it may be held? KVM RCU read sections are very brief.
> >>>>>
> >>>>>
> >>>>>
> >>>> Note that the delay imposed by the barrier is not only related to the
> >>>> length of the critical section. The barrier blocks until the next grace
> >>>> period, and depending on the type of RCU you are using and your config
> >>>> options, this could be multiple milliseconds.
> >>>>
> >>>> I am not saying that this is definitely a problem for your design. I
> >>>> am just pointing out that the length of the KVM-RCU read section is only
> >>>>
> >>>>
> >>> Yeah I understand that other RCU read section may introduce delays too.
> >>> The question is how big the delay may be.
> >>>
> >> I think you are misunderstanding me. The read-side CS is not a
> >> significant factor here so I am not worried about concurrent read-side
> >> CS causing a longer delay. What I am saying is that the grace period of
> >> your RCU subsystem is the dominant factor in the equation here, and this
> >> may be several milliseconds.
> >>
> >>
> > How is the "grace period" is determined? Isn't it just means "no cpus is
> > in RCU read section anymore"?
> >
>
> Nope ;)
>
Now I recall something about each CPU passing scheduler. Thanks.
> RCU is pretty complex, so I won't even try to explain it here as there
> are numerous articles floating around out there that do a much better job.
>
> But here is a summary: RCU buys you two things: 1) concurrent readers
> *and* writers, and 2) a much lower overhead reader path because it
> generally doesn't use atomic. Its point (2) that is relevant here.
>
> If taking an atomic were ok, you could approximate the RCU model using
> reference counting. Reference counting buys you "precise" resource
> acquistion/release at the expense of the overhead of the atomic
> operation (and any associated cache-line bouncing). RCU uses a
> "imprecise" model where we don't really know the *exact* moment the
> resource is released. Instead, there are specific boundaries in time
> when we can guarantee that it had to have been released prior to the
> expiry of the event. This event is what is called the "grace period".
>
> So that is what synchronize_rcu() is doing. Its a barrier to the next
> imprecise moment in time when we can be assured (if you used the rest of
> the RCU API properly) that there can not be any outstanding references
> to your object left in flight. Each grace period can be milliseconds,
> depending on what version of the kernel you have and how it is configured.
>
> HTH
>
> Kind Regards,
> -Greg
>
> >
> >>> I don't think multiple
> >>> milliseconds delay in device de-assignment is a big issue though.
> >>>
> >>>
> >> I would tend to agree with you. It's not fast path.
> >>
> >> I only brought this up because I saw your design being justified
> >> incorrectly: you said "KVM RCU read sections are very brief", but that
> >> is not really relevant to Michael's point. I just want to make sure
> >> that the true impact is understood.
> >>
> >> Kind Regards,
> >> -Greg
> >>
> >>
> >>
> >
> >
> >
> > --
> > Gleb.
> >
>
>
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html