I guess you misunderstood what I mean. I mean that there seems to be no interest in the discussion on IRC yesterday that we introduce a new scheme. If so, I will not work on that part. For part I, ie., moving the irq management to mach side, it people think it is useful, I will prepare a series of revised patch for that.
Junling > On Aug 2, 2020, at 4:33 PM, Samuel Thibault <[email protected]> wrote: > > Junling Ma, le dim. 02 août 2020 16:19:39 -0700, a ecrit: >> I was distracted by using a counter at all in the original code, >> because an irq cannot be fired more than once before it is acked. > > Right, normally that doesn't happen. > >> What we really need is a boolean_t type per notification, > > And that could be an option. > >>>> IN fact, I was pondering on the idea that each irq* device should be >>>> opened exclusively, >>> >>> Then things won't work. Shared IRQ is a thing you can't circumvent with >>> the hardware that people have atm. >> >> You may have missed the part where I say let user side [/dev/irq*] translator >> do the multiplexing. > > I didn't but that looks like extra complexity, and makes the whole thing > less robust: if the multiplexer gets to die somehow, everything dies. > >> And when MSI/MSI-X is introduced, irq sharing will not be needed. > > That would still ignore all the hardware that people still have. > >>>> The original way of counting means that the interrupt will be >>>> unfortunately blocked forever if a message somehow failed to deliver. >>> >>> There is no such thing. Either the message gets delivered, or it does >>> not. It can take some time to get delivered. Then other translators >>> using the same IRQ will suffer from the situation. But that's what we >>> *have* to do. Until the userland translator either acknowledges the irq >>> or dies, we *have* to keep the irq disabled (and know that we have done >>> so for the death case). We can't go around it anyway, so there is no >>> point in trying to fix it. >> >> The delivery could have returned an error, for example, if ikm_alloc return >> NULL, as checked in deliver_intr. > > In such a case, I don't think we can do anything better than closing > the user port, so the user can get notified of the missed IRQ, and > possibly try to restart the driver (which is fine for network cards for > instance). So we're back to the "port death" case. > >>>> This patch increases n_unacked only if a notification is delivered, >>> >>> That would mean that the responsibility of ack-ing interrupts belongs to >>> userland? >> >> We already require that user land ack the interrupt by calling >> device_intr_ack, >> right? > > That it acks, yes. That it tracks how many times __enable_irq() should > be called on port death, no. > > Again, just checking for port death is a simple way and gets everything > working as expected. What I can grasp from what you proposed, it is much > less clear. > >> As I understand, if a handler has not acked, then the n_unacked remain >> positive, and also __enable_irq will not be called, and thus the irq >> remains disabled, for all other handlers attached to the same irq. > > Yes. Until the user notices the translator is stuck, and thus kills it, > and thus the kernel notices the port death, and can unlock the IRQ. > >> That is what I felt from the IRC discussion yesterday, too. And that is fine >> if >> there is no interest in this patch. > > The thing is: I do not see what improvement there is here. It took us > some time to get to this working state of ack semantic. If you want to > improve it, then fine, but you have not explained where there is actual > improvement (and just rewriting the whole thing for the sake of it will > most probably only bring bugs). > >> But still, new comes may ask the same question as I did, why would >> any device driver need to implement device_intr_register and >> device_intr_ack, > > That question is complementely independent of everything we have been > discussing above. Again a reason why to separate changes into separate > patches, so we can discuss separately and avoid squashing together > things that don't actually need to be, thus blocking improvements on > some side just because some other side poses question. > > My only comment on patch-2 was that it was introducing other unrelated > changes, making it unreviewable. As discussed on IRC the other day, your > device IPC change makes sense. But I don't see any relation with the IRQ > *management* that you have introduced. Make that patch independent from > the whole rest, and we'll be able to discuss it. > > > Really, I'm wondering how it cannot be obvious for newcomers that yes, > you need to separate out changes so we can discuss things separately, > otherwise it's just a big mess and we can't work things out. > > Samuel
