On Fri, Mar 11, 2016 at 04:46:49PM +0100, Laszlo Ersek wrote:
> On 03/11/16 05:26, Gary Lin wrote:
> > On Thu, Mar 10, 2016 at 12:31:18PM +0100, Laszlo Ersek wrote:
> >> On 03/10/16 11:02, Gary Lin wrote:
> >>> On Thu, Mar 10, 2016 at 10:20:12AM +0100, Laszlo Ersek wrote:
> >>>> On 03/10/16 08:49, Gary Lin wrote:
> >>
> >>>>> I found that it's related to iPXE. If I disable iPXE with
> >>>>>
> >>>>> "-netdev user,id=hostnet0 -device 
> >>>>> virtio-net-pci,romfile=,netdev=hostnet0"
> >>>>>
> >>>>> then everything works as expected. I'll try to dig deeper to find more
> >>>>> information.
> >>>>
> >>>> How fresh is your ipxe? And did you build it with CONFIG=qemu?
> >>>>
> >>> It comes from the qemu 2.5.0 tarball and I just installed it from
> >>> openSUSE Virtualization repo.
> >>> https://build.opensuse.org/package/show/Virtualization/qemu
> >>
> >> QEMU 2.5 bundles iPXE binaries built with CONFIG=qemu, but that
> >> configuration doesn't enable IPv6 for the moment.
> >>
> >> Apparently, combining iPXE (CONFIG=qemu) with NETWORK_IP6_ENABLE (set in
> >> OVMF) causes problems. I don't understand how this causes side effects,
> >> because CONFIG=qemu instructs iPXE to provide SNP drivers only. SNP is
> >> independent of IP version.
> >>
> >> With this combination, IPv6 PXE should work (not be absent), and IPv6
> >> HTTP Boot should work too (not blow up).
> >>
> >> Can you perhaps remove grub2 from the equation? What if you play with
> >> disconnect / reconnect / connect in the UEFI shell?
> >>
> > hmmm I found another way to crash the system with a different assert.
> > 
> > Just disconnect/reconnect the handle providing HttpServiceBinding, and
> > OVMF crashed with/without iPXE:
> > 
> > ASSERT /home/gary/git/edk2/NetworkPkg/HttpBootDxe/HttpBootDxe.c(796): CR 
> > has Bad Signature
> > 
> > Here is the function that issues the assert.
> > 
> > EFIAPI HttpBootIp6DxeDriverBindingStart ()
> >   ...
> >   if (!EFI_ERROR (Status)) {
> >         Private = HTTP_BOOT_PRIVATE_DATA_FROM_ID(Id); <---- CRASH
> >   } else {
> >   ...
> > 
> > This also only happens after fa848a4048943251fc057fe8d6c5a82e01d2ffb6.
> 
> I found the bug. There are two key UEFI facts that are necessary for
> understanding it.
> 
> (1) The first fact is that for any given handle, at most one instance of
> a given protocol interface (= GUID) can be installed on it. In other
> words, if you pick a handle, and pick a GUID, the handle either has the
> protocol interface installed, or not. The handle cannot have two or more
> instances of the same protocol.
> 
> (2) The second fact is how the DisconnectController() boot service
> works, in case the DriverImageHandle parameter is specified as non-NULL,
> and the ChildHandle parameter is NULL. (This case means "disconnect the
> given driver from the given controller".) Let me quote the spec:
> 
>     A driver is disconnected from a controller by calling the Stop()
>     service of the EFI_DRIVER_BINDING_PROTOCOL. The
>     EFI_DRIVER_BINDING_PROTOCOL is on the driver image handle, and the
>     handle of the controller is passed into the Stop() service.
> 
> Now put the two facts together. You have a driver. For the driver, you
> have one ImageHandle. On that handle, you can install one instance of
> the driver binding protocol (due to fact (1)). So, when someone wants to
> disconnect a given controller from your driver, and calls
> gBS->DisconnectController() accordingly, exactly *one* Stop() function
> in your driver will be called. After that, the DisconnectController()
> boot service will return, and a complete successful disconnection
> between your driver and the controller will be assumed by the system.
> 
> The HttpBootDxe driver violates this.
> 
> If you look at its entry point function (called
> HttpBootDxeDriverEntryPoint()), you see that it installs *two* instances
> of the driver binding protocol -- but only the first (the IPv4 one) goes
> on the driver's genuine ImageHandle. The second instance (the IPv6 one)
> goes on to a NULL handle.
> 
> The EfiLibInstallDriverBindingComponentName2() utility function seems to
> support this. A new handle is created by the system, and the
> "gHttpBootIp6DxeDriverBinding" protocol instance is installed on that
> separate, new handle.
> 
> Now let us consider the HttpBootConfigFormInit() and
> HttpBootConfigFormUnload() functions that were introduced by the commit
> you found. The HttpBootConfigFormInit() function adds a permanent
> reference to the parent handle's HTTP SB protocol, using the attribute
> EFI_OPEN_PROTOCOL_BY_CHILD_CONTROLLER. The HttpBootConfigFormUnload()
> function removes this reference.
> 
> The IPv4 and IPv6 variants of the driver binding start & stop functions
> implement a kind of reference counting between each other. Namely,
> regardless of which one of the IPv4 or IPv6 binding start functions is
> called first, *that* first one will call HttpBootConfigFormInit(), and
> the second one will not call it. The resources set up by the
> HttpBootConfigFormInit() function (including the child reference to HTTP
> SB) are shared between IPv4 and IPv6.
> 
> Similarly, when the stop functions are called, their ordering is
> irrelevant; it is always the second one that is supposed to call
> HttpBootConfigFormUnload().
> 
> Now, assume that the IPv4 start function is called first. The
> HttpBootConfigFormInit() function is invoked, and it creates the
> by-child reference to the parent's HTTB SB protocol interface. Also,
> this reference is tagged with the image handle on which the IPv4 driver
> binding protocol is installed -- the real image handle.
> 
> Assume the IPv6 start function is called second. The shared data is
> already in place.
> 
> In my testing, when I call "disconnect B2" at this point in the UEFI
> shell (0xB2 being the handle of the virtio-net NIC), the HTTPv4 Stop
> function is called first. It does not call HttpBootConfigFormUnload(),
> because the IPv6 side of the driver still uses the shared data.
> 
> The problem is, the by-child reference left behind (for the IPv6 side to
> release later) is *still* associated with the IPv4 image handle. This
> violates the DisconnectController() contract, and things go downhill
> from there. The CoreDisconnectControllersUsingProtocolInterface()
> function in "MdeModulePkg/Core/Dxe/Hand/Handle.c" finds the lost
> reference (OpenCount > 0), and it tries to reconnect the drivers that
> have been already (partially) disconnected -- including the HTTPv4 start
> function. That's when things completely blow up, and an ASSERT() is
> triggered.
> 
> I don't think it is possible to share such a by-child reference. I think
> each half should have its own reference, and that reference should be
> dropped when the matching Stop function runs.
> 
> I think there were problems with HttpBootConfigFormInit() during
> development -- a field called "Initilized" is checked at the beginning
> of the function, but it isn't used anywhere else in the driver. I think
> it is residue from an earlier attempt to get this sharing right.
> 
> It is also clear why the bug only hits with IPv6 enabled in OVMF. If
> IPv6 is disabled, then the v6 Start() function never runs / succeeds (so
> no sharing takes place actually), and then the IPv4 Stop function
> releases everything, including the by-child reference.
> 
Wow, thanks for the thorough analysis!

I thought it caused by accidental double-free in the commit but it
turned out the problem is more than that.

Glad to see the root cause is identified.

Thanks,

Gary Lin
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to