> On Aug 2, 2020, at 3:44 PM, Samuel Thibault <[email protected]> wrote: > > Junling Ma, le dim. 02 août 2020 15:32:01 -0700, a ecrit: >>>> 1. Change devicer_user_intr to be a linux interrupt handler. >>> >>> Right, this makes sense, but did you actually try it? Check the existing >>> code twice to realize why it's done that way. Here linux_intr is looking >>> through the list of actions for an irq. You are making action->handler() >>> call free_irq itself when the handler should be removed. But then the >>> loop of linux_intr will have its action pointer undefined since freed by >>> free_irq. >> >> Right I realized that, and moved the free_irq to intr_thread. Will clean up >> the patch. >> >>> I agree that we should probably remove some of the "user_intr" pieces of >>> linux_intr(). But we cannot afford removing the part that checks the >>> value returned by the handler, otherwise it'll break. >> >> If we move the cleaning-up-dead-notification code to intr_thread, then the >> code should not break. > > Ok. > >>>> 2. Make user_intr_t represent a specific interrupt line, so that >>>> notifications queue on each line. >>>> 3. Make struct irqdev store a vector user_intr_t, indexed by interrupt >>>> line. >>> >>> ? But then we can't have several userland translators using the same >>> irq? That'll break. >> >> Each user_intr_t has a queue of notifications. Originally, each notification >> was queued up on the linux side. In this patch we just move them back to the >> mach side and sort them by interrupt line (i.e., attach to user_intr_t). I >> am not sure why it would break. > > To be honest, I don't know. Because your patch changes so many thing > that I can't even take the time to try to track what is being changed > how. What I can see easily, however, is: > > +typedef struct { > + queue_head_t notification_queue; > + unsigned interrupts; /* Number of interrupts occurred since last run of > intr_thread */ > + unsigned n_unacked; /* Number of times irqs were disabled for this */ > + decl_simple_lock_data(, lock); /* a lock to protect the queues */ > } user_intr_t; > > So the port is gone indeed, so it tends to say that now it's per intr > instead of per-user. But then why calling it user_intr_t? Changing the > role of an existing structure is really not the way for changes to be > reviewable.
user_intr_t to me means a user interrupt. A notification port that is registered represents a notification. Multiple notifications may queue on a single irq. > Now, in that struct there is n_unacked. But how can this be per-irq? As > I mentioned elsewhere and it is kept here, it's the number of times irq > where disabled for this. But that's for a given *user*, not for a given > irq. If it's not per-user, when a user dies you don't know how many > times you need to call __enable_irq. Ok, I can see that in my patch n_unacked duplicates the counter used by __enable_irq. I was distracted by using a counter at all in the original code, because an irq cannot be fired more than once before it is acked. What we really need is a boolean_t type per notification, >> IN fact, I was pondering on the idea that each irq* device should be opened >> exclusively, > > Then things won't work. Shared IRQ is a thing you can't circumvent with > the hardware that people have atm. You may have missed the part where I say let user side [/dev/irq*] translator do the multiplexing. And when MSI/MSI-X is introduced, irq sharing will not be needed. >> I am afraid that an interrupt going through two indirect calls would be too >> time costly to hurt driver performance. > > Interrupts *are* inherently costly. A couple software layers are really > not much compared to that. Drivers are already used to have to gather > notifications so as to spare interrupts. ok. >>>> 4. The n_unacked counter only increases when successfully delivered an >>>> interrupt. >>> >>> ? It is meant to know how many times __disable_irq was called, and thus >>> how many times we should call __enable_irq if the port dies. >> >> The __disable_irq uses its own counter. > > Yes, and the way you do it just duplicates it, that's nonsense. Either > it's the same, or it's not. As mentioned above, it's not.µ Yes. See the above discussion. >> The original way of counting means that the interrupt will be unfortunately >> blocked forever if a message somehow failed to deliver. > > There is no such thing. Either the message gets delivered, or it does > not. It can take some time to get delivered. Then other translators > using the same IRQ will suffer from the situation. But that's what we > *have* to do. Until the userland translator either acknowledges the irq > or dies, we *have* to keep the irq disabled (and know that we have done > so for the death case). We can't go around it anyway, so there is no > point in trying to fix it. The delivery could have returned an error, for example, if ikm_alloc return NULL, as checked in deliver_intr. >> This patch increases n_unacked only if a notification is delivered, > > That would mean that the responsibility of ack-ing interrupts belongs to > userland? We already require that user land ack the interrupt by calling device_intr_ack, right? The kernel side keeps track of who acked and who hasn’t. As I understand, if a handler has not acked, then the n_unacked remain positive, and also __enable_irq will not be called, and thus the irq remains disabled, for all other handlers attached to the same irq. >> I am of course happy to produce a new version of the patches. But I >> also feel that whether we want to introduce this new scheme still >> needs more discussion. > > So far (except for the first part of your first patch) what you > have proposed really does not convince me at all as being an actual > improvement over what we currently have, on the contrary. That is what I felt from the IRC discussion yesterday, too. And that is fine if there is no interest in this patch. But still, new comes may ask the same question as I did, why would any device driver need to implement device_intr_register and device_intr_ack, when they are excluded from the scene at all (only irq device has a chance of receiving these calls). Maybe there are other, better, solutions than this patch. Maybe it is not a problem at all for others. Junling
