Hi Amit, On Mon, 9 Jun 2008 11:56:56 +0100, Amit Walambe wrote: > Over the weekend, I have investigated a bit more and the problem doesn't > appear on 2.6.23 from Fedora 8. So, this discussion holds value only if > anyone thinks the problem is a function of latencies (due to > responsiveness of the controller and/or other factors) and can surface > some other time. Otherwise, put a return 0 right here :). > > I still think the patch is applicable even to the latest kernels. Please > bear with me for a bit longish reply.
I don't understand. You just said yourself that the problem didn't show on a 2.6.23 kernel, meaning that we somehow fixed the bug already, but still you want your patch to be applied? > On Sun, 8 Jun 2008 20:18:28 +0200 > "Jean Delvare" <[EMAIL PROTECTED]> wrote: > > Out of curiosity, why are you using this proprietary utility instead > > of the standard i2cget and i2cset programs? > > > > > This is the relevant line from the lspci output : > > > 00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) > > > SMBus Controller (rev 02) > > > > Which kernel is your customer using? There have been a number of > > improvements done to the i2c-i801 driver over the last year. And what > > motherboard is this? Please provide the output of i2cdetect for this > > bus. > The board is a SBC named Apollo from Eurotech. Kernel is 2.6.9 from > Fedora 3. (At first, I looked through the latest kernel code to see if > I can backport any fixes, but there wasn't anything relevant to the > problem at hand.) 2.6.9 is very old. There have been so many changes made to the i2c-i801 driver since then that it's quite difficult to make sure there's no relevant change, unless you're intimate with this driver and device. For example, this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ca8b9e32a11a7cbfecbef00c8451a79fe1af392e seems pretty relevant to me, as it attempts to cleanly stop the controller on transaction timeout. It went into kernel 2.6.23. > i2regs is still around because 'if it works, don't change it' thing. > Also, it has 'mask' functionality which some people use. We can put > masks into standard utilities if you wish to do so. i2cset supports this functionality since October 2004. > Here is the i2cdetect output : > #./i2cdetect -l > i2c-0 smbus SMBus I801 adapter at 1880 SMBus adapter I asked for the output of i2cdetect (i2cdetect 0 in your case), not i2cdetect -l. I wanted to have an idea what chips were present on the bus. > > > Enabling debug in kernel produced following messages : > > > i801_smbus 0000:00:1f.3: Failed reset at end of transaction(01) > > > i801_smbus 0000:00:1f.3: SMBus busy (01). Resetting... > > > i801_smbus 0000:00:1f.3: Failed! (01) > > > i801_smbus 0000:00:1f.3: SMBus busy (01). Resetting... > > > i801_smbus 0000:00:1f.3: Failed! (01) > > > i801_smbus 0000:00:1f.3: SMBus busy (02). Resetting... > > > i801_smbus 0000:00:1f.3: Successfull! > > > The fix involved was to write a loop to wait for the final reset to > > > go through. After that, we still see following messages, but > > > atleast it is not failing over : > > > i801_smbus 0000:00:1f.3: SMBus busy (02).Resetting... > > > i801_smbus 0000:00:1f.3: Successfull! > > > i801_smbus 0000:00:1f.3: SMBus busy (02). Resetting... > > > i801_smbus 0000:00:1f.3: Successfull! > > > > Comparing this to the log above, I don't see any change. Resetting > > when register value was 02 was already working. It's resetting when > > register value was 01 that was failing, so that's what you need to > > check with the patch applied. > > I think the problem was that resetting after a transaction was taking a > bit longer and as we are doing heavy read/writes, the next request > would arrive, only to find the host controller busy. Please note that "resetting" is not really the correct term for what the driver is doing. We should fix this debug message to avoid the confusion. The driver is merely clearing status bits, it's not actually resetting anything. At least it wasn't in 2.6.9. Since 2.6.23 it kills the stuck transactions, which is kind of a reset, but that's a different piece of code than the part you wanted to modify. One important thing to notice is that the driver is still doing something wrong in this area. If the host is found busy (bit 0 of the status register) the driver attempts to write 1 to that bit. Given that the bit in question is read-only, I'd say the chances that it works are thin... If the host is busy then this strongly suggests that somebody else is using it and we should abort immediately. > The patch only waits till the operation finishes, so that no subsequent > transaction gets 01 (busy) error. As written above, 01 can't be cleared by the driver with the code in 2.6.9. So your patch is waiting for the bit to clear (which makes some sense) but the "reset" operation is not responsible for this. The bit would have cleared itself after some time anyway. The "reset" operation as it exists in 2.6.9 is merely a bit clearing operation, as I wrote above, and this doesn't take time to complete. If the next check shows that any cleared status bit is up (other than 01), it's rather "up again" than "still up", I think. > I guess the above log shows that the scenario is taken care by the > patch. The second condition (02 : INTR bit set : transaction complete) > is seen by the next transaction rarely and is a status message, not > failure. Ah, OK, I get the idea now. > > I am very worried by the code 01 which triggers the reset (which is > > probably not the right thing to do, BTW.) The host should never be > > busy when we start a transaction. If the problem happens frequently, > > I think we have a case where it is busy because the final operation > (resetting the status bits) in the last transaction didn't finish. Then I believe that Oleg Ryjkov's patch (URL above) is the proper fix for your problem. > I first tried to reduce reads/writes for the reset to make it "less > busy" and added a small delay, but it didn't work : > diff --git a/drivers/i2c/busses/i2c-i801.c > b/drivers/i2c/busses/i2c-i801.c index b0f771f..0dde673 100644 > --- a/drivers/i2c/busses/i2c-i801.c > +++ b/drivers/i2c/busses/i2c-i801.c > @@ -195,9 +195,10 @@ static int i801_transaction(int xact) > dev_dbg(&I801_dev->dev, "Error: no response!\n"); > } > > - if ((inb_p(SMBHSTSTS) & 0x1f) != 0x00) > - outb_p(inb(SMBHSTSTS), SMBHSTSTS); > + if ((temp = (inb_p(SMBHSTSTS) & 0x1f)) != 0x00) > + outb_p(temp, SMBHSTSTS); > > + msleep(1); > if ((temp = (0x1f & inb_p(SMBHSTSTS))) != 0x00) { > dev_dbg(&I801_dev->dev, "Failed reset at end of transaction " > "(%02x)\n", temp); > > > this strongly suggests that some other code is making use of the host > > controller. This could for example be ACPI code. Please provide the > > DSDT for this machine (in private.) > > I can assure you that there are no other transactions on the bus. We > verified using an analyser. OK, good thing. Then indeed the reason would be improperly terminated transactions affecting the next transaction, as you suggested. > Please let me know what you think now. This is the patch which I ran > over the weekend without any failures : > diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c > index b0f771f..d165829 100644 > --- a/drivers/i2c/busses/i2c-i801.c > +++ b/drivers/i2c/busses/i2c-i801.c > @@ -198,9 +198,17 @@ static int i801_transaction(int xact) > if ((inb_p(SMBHSTSTS) & 0x1f) != 0x00) > outb_p(inb(SMBHSTSTS), SMBHSTSTS); > > - if ((temp = (0x1f & inb_p(SMBHSTSTS))) != 0x00) { > - dev_dbg(&I801_dev->dev, "Failed reset at end of transaction " > - "(%02x)\n", temp); > + /* We wait for a fraction of a second! */ > + do { > + msleep(1); > + temp = inb_p(SMBHSTSTS); > + } while ((temp & 0x01) && (timeout++ < MAX_TIMEOUT)); > + > + /* If the SMBus is still busy, we give up */ > + if (timeout >= MAX_TIMEOUT) { > + result = -EBUSY; > + dev_dbg(&I801_dev->dev, "Failed reset at end of transaction" > + "(%02x)\n", temp); > } > dev_dbg(&I801_dev->dev, "Transaction (post): CNT=%02x, CMD=%02x, " > "ADD=%02x, DAT0=%02x, DAT1=%02x\n", inb_p(SMBHSTCNT), As written above, I think your problem is already fixed in 2.6.23 so no patch is needed. I wouldn't want your patch as is anyway, as it adds a sleep unconditionally, making the driver twice as slow as the original code. One thing that might still be worth investigating is why some transactions do not complete in the first place. This might be related to the fact that the timeout loop was written with HZ=100 in mind and is broken by design. It timeouts in count (100, where all but one drivers have 500), not time. Depending on various parameters (amongst which the value of HZ), the actual time it waits for is very different. At HZ=100 it was typically 2 seconds, but at HZ=1000 it would be 200 ms, which isn't that much. Although at 16 kHz, a 32-byte I2C block read transaction would take 20 ms if I count correctly, so it still fits. Could it be that some slave on the customer's I2C bus is misbehaving and holding the clock line low for longer than it should? -- Jean Delvare _______________________________________________ i2c mailing list [email protected] http://lists.lm-sensors.org/mailman/listinfo/i2c
