Hi Laszlo, First of all, thanks a lot for very nice explanation!
On Thu, Oct 01, 2015 at 01:50:31PM +0200, Laszlo Ersek wrote: > CC'ing Mark Salter, and edk2-devel, also updating the subject slightly > for better context. > > On 10/01/15 11:26, HATAYAMA Daisuke wrote: > > Currently, as of the commit f348aee7b33dd85e7da62b497a96a7319a0bf9dd, > > SNP is exclusively reopened to avoid slowdown or complete failure to > > load files due to races between MNP background polling and grub's > > receiving and transmitting packets. > > > > This exclusive SNP reopen affects some EFI applications/drivers that > > use SNP and causes PXE boot on such systems to fail. Hardware filter > > issue fixed by the commit 49426e9fd2e562c73a4f1206f32eff9e424a1a73 is > > one example. > > I think the above two commit references are in inverse order. That is, > commit 49426e9f is the one responsible for the (needed) exclusive open, > and commit f348aee7 fixes up the former by reconfiguring the receive > filters. > > In other words, the first two paragraphs here seem accurate, just please > reorder the commit hashes. > > > The difficulty here is that effects of the exclusive SNP reopen > > differs from system to system depending their UEFI firmware > > implementations. One system works well with the commit > > f348aee7b33dd85e7da62b497a96a7319a0bf9dd only, another system still > > needs the commit 49426e9fd2e562c73a4f1206f32eff9e424a1a73, and there > > could be other systems where PXE boot still fails. > > Here too I think the commit hashes should be switched around. > > > > > Actually, PXE boot fails on Fujitsu PRIMEQUEST 2000 series now. > > > > Thus, it seems to me that the exclusive SNP reopen makes grub > > maintenance difficult by affecting a variety of systems differently. > > (Admittedly, this is going to be a bit of speculation:) I think the > difference in behavior is not due to SNP implementations, but due to > differences in higher level protocol implementations -- i.e., the > behavior depends on what those drivers do to the underlying SNP that are > *closed*. > > According to the spec of OpenProtocol(), in case of a successful > exclusive open, the other BY_DRIVER reference is kicked off with > DisconnectController(), which in turn calls the other driver's > EFI_DRIVER_BINDING_PROTOCOL.Stop() function. > > So, the question is what that *other* (higher level) driver does in its > Stop() function, when it unbinds the handle with the underlying SNP > interface. Does it mess with SNP's receive filters? And so on. > > > > > Instead, the idea of this patch is to disable MNP background polling > > temporarily while grub uses SNP. Then, the race issue must be removed. > > This is not the right approach in my opinion. > > The original problem is that GRUB's direct access to SNP races with > *several* MNP instances to the same SNP. The MNP instances are correctly > arbitrated between each other (see more on this later), but the SNP > accesses from GRUB are not. > > SNP is meant as an exclusive-access interface to the NIC, so those > parallel clients won't work. > > One way to solve that is to kick out those other SNP accessors, by way > of exclusive open. This is correct from a UEFI driver model perspective, > but -- unless GRUB does full reconfiguration on the SNP level -- brittle > in practice, as you've experienced; apparently different implementations > of higher level protocols leave the SNP in different states when they go > away. (It's perfectly possible that some of those driver binding Stop() > functions have bugs.) > > However, the other way (because there is another way, yes) is different > from interfering with existing MNP instances (note: plural). MNP (and a > bunch of other networking related protocols) don't work like your usual > UEFI device drivers; they are child protocols (= protocol interfaces on > child handles) that are created *as needed* with the help of Service > Binding protocol instances. > > Please read the following sections of the UEFI spec (v2.5) carefully: > > - 2.5.8 EFI Services Binding > - 10.6 EFI Service Binding Protocol > - 24.1 EFI Managed Network Protocol > EFI_MANAGED_NETWORK_SERVICE_BINDING_PROTOCOL Taking into account above and sentences in UEFI spec (v2.5) like "Once the remote image is successfully loaded, it may utilize the EFI_PXE_BASE_CODE_PROTOCOL interfaces, or even the EFI_SIMPLE_NETWORK_PROTOCOL interfaces, to continue the remote process." I have a feeling that UEFI spec is very imprecise how to use SNP. I have not seen any single word saying that there are any constraints on SNP usage (am I missing something?). I heard about that in our internal discussions but I was not convinced that it is true until I have seen your email with all details. So, now I think that UEFI spec should have special paragraph saying how to play with SNP. What do you think about that? By the way, I saw that other boot loaders like PXELINUX or iPXE use SNP. Do you suggest that all of them should be rewritten to avoid issues with this protocol? Daniel _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel