On 2015-10-12 18:13, Cyrille Pitchen wrote: > Le 12/10/2015 17:13, Peter Rosin a écrit : >> On 2015-10-05 17:09, Peter Rosin wrote: >>> But what trouble does the i2c bus driver see? Admittedly I only >>> have a simple logic level bus viewer, and not a full-blown >>> oscilloscope, so there might be something analogue going on? >>> I don't think so though, those signals looked fine last time we >>> looked (but we obviously didn't have these issues then, and >>> didn't really look that closely). I'll see if I can recheck >>> with a real scope too. >> >> We redid the tests with a real scope, and the signal looks nice >> and square, so it is not that. >> >> Speculating further on the cause of the long ACKs, I think that >> the i2c driver gets confused by an interrupt that marks the >> transfer complete, and thinks it's a NACK interrupt instead. Is that >> plausible? >> >> If the peripheral unit is such that it generates a stop automatically >> on NACKs, then this makes perfect sense. I.e. the TWI sees that the >> transfer is complete, generates an interrupt, and waits for further >> data or a stop command. Meanwhile the driver thinks it's a NACK and >> that a stop condition has already been sent to the bus, and just >> notifies the i2c consumer (the eeprom driver in this case) of the >> failure and frees up the bus for any future user. >> >> This also matches what I see when I turn on some more traffic on the >> bus, that is interleaved with the eeprom traffic. AFAICT, it can be >> any command that gets chewed up by the eeprom if it is sent to the >> i2c driver during the long ACK. >> >> Are you Atmel people making any progress on this data corrupting >> regression? Is there anything else I can do to help? >> >> Cheers, >> Peter >> > > Hi Peter, > > I have sent a patch to Ludovic for a first internal review before publishing > to > mainline. The patch should fix your issue since it fixes it on my sama5d36ek > board with an at24 eeprom. > > More details on the reason of this bug would be provided in both the commit > message and comments in the code provided by the reviewed patch but I you want > an early fix just read the Status Register (AT91_TWI_SR) at the beginning of > at91_do_twi_transfer(). This read clears the NACK bit in the Status Register. > Then the following source code can safely enable the NACK interrupt, otherwise > in some cases a pending NACK interrupt would rise immediately after the line: > at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_NACK); > hence breaking the sequence of operations to be done because the interrupt > handler would call complete() too early so wait_for_completion_timeout() > also exits too early. > > So reading the Status Register at the beginning of at91_do_twi_transfer() > should be enough to fix the issue.
Yes, I see no more long ACKs after that reading the Status Register there. Great! > Another mistake is in the interrupt handler itself, ie atmel_twi_interrupt(): > we should check the TWI_TXRDY status bit before calling > at91_twi_write_next_byte() only if both the TWI_TXCOMP and TWI_NACK status > bits > are clear. Otherwise, writing a new byte into the THR tells the I2C controller > to start a new transfer. Then the I2C slave, the at24 eeprom, is likely to > also reply by a second NACK. Hence the NACK bit is already set into the Status > Register on the next call of at91_do_twi_transfer(). > This is what I saw on my scope for PIO transfers. I interpret this as a proposed solution for the strange double NACKs? Anyway, I find it unnecessarily hard to grasp exactly what you mean (wasteful policy you are apparently suffering from where it is OK to publish a patch written in English, but apparently a big no-no to send a diff until it passes some internal review???). I interpreted your "patch" in English as: at91_twi_read_next_byte(dev); - else if (irqstatus & AT_TWI_TXRDY) + else if ((irqstatus & (AT91_TWI_TXCOMP | AT91_TWI_TXRDY | AT91_TWI_NACK)) == AT91_TWI_TXRDY) at91_twi_write_next_byte(dev); But still see some double NACKs. Not always though, and it doesn't wreak any havoc. But it still looks strange and I can't explain them when looking at what the eeprom driver requests. Does this mean that there are more races present? Or, did I just parse your English "patch" badly? > By the way, in my case, the first NACK occurs because the at24 driver tries to > perform a second write transfer too quickly and the eeprom is not ready yet, > then replies with a NACK. Yes, I believe this is by design. Noone wants to encode the exact delays needed since they are hard to predict. So, the eeprom driver polls. > However while debugging, placing debug traces in the I2C controller driver > changed the timing: this small delay was enough to make the first NACK > disappear because the eeprom was now ready to process the second write > operation. Yes, w/o NACKs everything is going smooth regardless of kernel version. > For DMA transfer, you have to know that when NACK error is detected, the I2C > controller sets the TWI_TXCOMP, TWI_NACK and TWI_TXRDY bits all together in > the Status Register: > - The TWI_NACK is cleared by reading the Status Register. > - The TWI_TXRDY triggers the DMA controller to write the next byte in the THR, > which should also tell the I2C controller to start a new transfer. > - The TWI_TXCOMP bit is cleared by writing into the THR: this is why the > interrupt handler may fail to detect NACK if it relies on TWI_TXCOMP. > > Best Regards, > > Cyrille > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/