On 01/18/16 19:44, John Snow wrote: > > > On 01/18/2016 05:57 AM, Laszlo Ersek wrote: >> Hello Feng, John, >> >> On 11/03/15 02:48, Tian Feng wrote: >>> When executing ATAPI cmd at IDE mode, EFI_SUCCESS may be returned wrongly >>> with old logic but in fact DRQ is not ready and the transaction doesn't >>> get executed correctly at this time. >>> >>> Contributed-under: TianoCore Contribution Agreement 1.0 >>> Signed-off-by: Feng Tian <feng.t...@intel.com> >>> Cc: Star Zeng <star.z...@intel.com> >>> --- >>> MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c >>> b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c >>> index 4928ed5..f74e5ca 100644 >>> --- a/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c >>> +++ b/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c >>> @@ -1949,7 +1949,7 @@ AtaPacketReadWrite ( >>> // >>> Status = DRQReady2 (PciIo, IdeRegisters, Timeout); >>> if (EFI_ERROR (Status)) { >>> - return CheckStatusRegister (PciIo, IdeRegisters); >>> + return Status; >>> } >>> >>> // >>> >> >> Unfortunately, this patch breaks OVMF on QEMU's ide-cd device. The >> symptoms were reported in <https://github.com/tianocore/edk2/issues/43>; > > To confirm, does this affect both the AHCI ATAPI device and the BMDMA > ATAPI device? (Q35 and PC default HBA, respectively) Or is it just AHCI?
I only used i440fx for testing. This is my command line: qemu-system-x86_64 \ -debugcon file:ovmf.log \ -global isa-debugcon.iobase=0x402 \ -net none \ -enable-kvm \ -pflash OVMF.fd \ -drive id=cd0,if=none,readonly,format=raw,file=f22.iso \ -device ide-cd,drive=cd0,bootindex=0 If I add "-M q35" (--> AHCI), then things seem to work fine; I can reach grub on the F22 workstation installer CD. >> I reproduced the issue and bisected it to this patch (git commit >> 7cac2401 / SVN 19611). I also confirmed it by reverting the patch on top >> of current master, and the CD-ROM started to work. >> >> I'm adding John Snow, who maintains QEMU's IDE emulation. >> >> Can you guys please investigate this patch, and discuss why it breaks on >> QEMU's ide-cd device? >> >> John, if you'd like to browse the code, the following link highlights >> the line that the above patch changes: >> >> https://github.com/tianocore/edk2/blob/7cac2401635317360226a235d1f95f8347081cbb/MdeModulePkg/Bus/Ata/AtaAtapiPassThru/IdeMode.c#L1952 >> >> As far as I can see, this problem could be timing-sensitive. The >> DRQReady2() function polls for the DRQ bit to become set, in a certain >> amount of time. If that doesn't happen (or a definitive error emerges), >> the function fails. >> >> Pre-patch, such failure would not be decisive; the status register would >> be read and evaluated. Post-patch, the failure is decisive, even if it >> is just a timeout, and the CheckStatusRegister() call would otherwise >> return success. >> >> ... Hm, I added a debug log right after DRQReady2(), on the error branch >> (i.e., when it fails). The status code is *not* EFI_TIMEOUT, it is >> EFI_NOT_READY. >> >> Looking at DRQReady2(), this means: >> - BSY is clear >> - ERR is clear >> - DRQ is clear too >> >> Apparently, the DRQReady2() function expects that as soon as BSY is >> clear (and there is no error), DRQ must be immediately set (more or >> less: not busy, no error, so ready to transfer data). >> > > Thank you for your research! > >> Is that a valid assumption to make? >> > > Whenever BSY, DRQ and ERR are all clear that should be confirmation of a > completed command. That's interesting, because the edk2 code seems to treat this condition as "failure to transition to data transfer", or some such. > Under AHCI, this should *not* be true before the AHCI > device has signalled completion (Either via the status shadow register, > an interrupt, or an MSI interrupt, depending on device configuration.) > > (In fact, hw/ide/core.c ide_exec_cmd uses the presence of DRQ|BSY to > know whether or not it is in a state fit to accept a new command. If the > guest is not blocked when BSY|DRQ are unset, this means a new command > can be issued.) > > == Thought #1: == > > However, when a command completes, it updates the shadow registers in > the AHCI HBA right before signalling completion. It is, I think, > possible that if you are polling the AHCI shadow registers instead of > polling the AHCI status registers that you could find DRQ/BSY/ERR unset > before receiving a command completed notice (by a time window of two > passing electrons.) > > Take a look at ahci_cmd_done, which calls ahci_write_fis_d2h. We > populate a guest memory FIS buffer with the values of the registers, > then update the status shadow register, then signal completion (via IRQ, > MSI, or neither.) > > You are probably not hitting that reliably, so can I ask what command is > erroring out, here? 0xA0 PACKET for IDE, and 0x28 CMD_READ for SCSI with > the DMA bit off? There might be a problem with the device not managing > its state well. Again, I can only see it with i440fx. This is the section of the IDE debug log from QEMU that fails: ide: CMD=a0 ide: read addr=0x177 val=58 ATAPI limit=0xfffe packet: 12 00 00 00 60 00 00 00 00 00 00 00 reply: tx_size=36 elem_tx_size=0 index=0 byte_count_limit=65534 status=0x58 ide: read status addr=0x376 val=58 ide: read addr=0x175 val=00 ide: read addr=0x174 val=24 reply: tx_size=0 elem_tx_size=0 index=36 end of transfer, status=0x50 ide: read addr=0x177 val=50 ide: read status addr=0x376 val=50 <---- triggers the failure in edk2 IDE: write addr=0x176 val=0xe0 ide: write control addr=0x376 val=0a ide: read addr=0x177 val=50 IDE: write addr=0x176 val=0xe0 ide: read status addr=0x376 val=50 IDE: write addr=0x171 val=0x00 IDE: write addr=0x171 val=0x00 IDE: write addr=0x172 val=0x00 IDE: write addr=0x172 val=0x00 IDE: write addr=0x173 val=0x00 IDE: write addr=0x173 val=0x00 IDE: write addr=0x174 val=0x00 IDE: write addr=0x174 val=0xfe IDE: write addr=0x175 val=0x00 IDE: write addr=0x175 val=0xff IDE: write addr=0x177 val=0xa0 > > > == Thought #2: == > > We recently updated the ATAPI pio buffering loop to be asynchronous in > cases where it can be, which included some changes to when we fiddle > with DRQ and BSY. These changes went live for 2.5.0. Does version 2.4.1 > work? That would point to the new buffering loop. Great tip, let me check. (The above is experienced on v2.5.0-618-g19b6d84.) With v2.4.1, i440fx fails the same, and q35 works the same. > Useful information here: > > - In the 0xA0 PACKET command sent, what is the value of the > byte_count_limit? (this is a 16bit count comprised of the hcyl and lcyl > fields in the packet command structure.) Depending on if it matches the > block size of 2048 or not, we take a different buffering path. Multiples > of 2048 are buffered asynchronously, others are buffered synchronously. Please see above: 0xfffe; seems to imply synchronous. > > - In the 0xA0 PACKET command sent, what is the value of bit 0 in the > "feature" field? (the ATAPI DMA bit.) Affects the buffer loop depending > on which transport it's preparing to use. > > - What SCSI read command is being used? (Presumed 0x28 CMD_READ) Apparently 0x12 INQUIRY? Thanks Laszlo > > - How many sectors are being requested? > > >> Thanks! >> Laszlo >> > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel