Re: [i2c] [patch 2.6.25-git] i2c_adapters: return -Errno not -1

Jean Delvare Sun, 11 May 2008 03:28:47 -0700

Hi David,

Before I forget: I just noticed that your patch adds the following
warning:

  CC [M]  drivers/i2c/busses/i2c-amd8111.o
drivers/i2c/busses/i2c-amd8111.c: In function ‘amd8111_access’:
drivers/i2c/busses/i2c-amd8111.c:198: warning: unused variable ‘status’

Please fix.

On Sun, 11 May 2008 01:17:22 -0700, David Brownell wrote:
> On Saturday 10 May 2008, Jean Delvare wrote:
> > >  drivers/i2c/.............................
> > >  15 files changed, 176 insertions(+), 142 deletions(-)
> 
> So I'll probably respond to your comments in smaller chunks.  ;)
> 
> Arguably this should become a bunch of smaller patches, but that 
> would of course make more confusion during initial review for
> this type change ... harder to notice the patterns.

I am totally fine with all the drivers being fixed in the same patch, no
problem with this.

> > > --- g26.orig/drivers/i2c/algos/i2c-algo-bit.c     2008-05-01 
> > > 16:01:38.000000000 -0700
> > > +++ g26/drivers/i2c/algos/i2c-algo-bit.c  2008-05-01 16:02:58.000000000 
> > > -0700
> > >           ...
> > > @@ -465,9 +465,12 @@ static int bit_doAddress(struct i2c_adap
> > >   struct i2c_algo_bit_data *adap = i2c_adap->algo_data;
> > >  
> > >   unsigned char addr;
> > > - int ret, retries;
> > > + int ret;
> > > + unsigned retries;
> > >  
> > > - retries = nak_ok ? 0 : i2c_adap->retries;
> > > + retries = (nak_ok || i2c_adap->retries < 0)
> > > +                 ? 0
> > > +                 : i2c_adap->retries;
> > >  
> > >   if (flags & I2C_M_TEN) {
> > >           /* a ten bit address */
> > 
> > I'd rather not bother with the adap->retries stuff which we agreed
> > should be removed anyway.
> 
> Until it's gone, I just wanted to be sane with it.

To make it sane, just change "ret = -1" to "ret = 0" in try_address().
It does the "right thing" if the i2c adapter provided a negative number
of retries (which nobody does anyway.) That's the minimal fix and the
only one I am willing to take.

> Or should we just rip it out of this driver? Negative retries is a
> very nonsensical notion, and that's the only way I recall algo-bit
> seemed to have obvious (albeit potential) fault code goofiness.

Not in the middle of your -1 -> -errno patches, no. But once we're done
with them and the dust has settled down, yes, chip drivers will
hopefully have a reliable way to know when then want to retry a failed
read, and we can get rid of this automatic retry mechanism at the
adapter level.

> > > --- g26.orig/drivers/i2c/busses/i2c-ali1535.c     2008-05-01 
> > > 16:01:39.000000000 -0700
> > > +++ g26/drivers/i2c/busses/i2c-ali1535.c  2008-05-01 16:02:58.000000000 
> > > -0700
> > >                   ...
> > > @@ -295,7 +295,7 @@ static int ali1535_transaction(struct i2
> > >    * do a printk.  This means that bus collisions go unreported.
> > >    */
> > >   if (temp & ALI1535_STS_BUSERR) {
> > > -         result = -1;
> > > +         result = -EIO;
> > 
> > Given the comment above I'd make this -ENODEV. 99% of the time, no ack
> > means no device.
> 
> Either ENODEV or ENXIO for "it didn't ACK the address".  Any
> other code will make the driver core emit diagnostics when the
> probe() fails ... diagnostics we want to avoid, when probe()
> code does the Right Thing and just returns such fault codes.
> 
> That's probably an issue with most of these updates.

My point exactly. I am fine with both -ENXIO and -ENODEV, pick the one
you like, then make sure all drivers are using it consistently.

> > > --- g26.orig/drivers/i2c/busses/i2c-amd756-s4882.c        2008-05-01 
> > > 16:01:38.000000000 -0700
> > > +++ g26/drivers/i2c/busses/i2c-amd756-s4882.c     2008-05-01 
> > > 16:02:58.000000000 -0700
> > > @@ -58,7 +58,7 @@ static s32 amd756_access_virt0(struct i2
> > >   /* We exclude the multiplexed addresses */
> > >   if (addr == 0x4c || (addr & 0xfc) == 0x50 || (addr & 0xfc) == 0x30
> > >    || addr == 0x18)
> > > -         return -1;
> > > +         return -EINVAL;
> > >  
> > >   mutex_lock(&amd756_lock);
> > >  
> > > @@ -86,7 +86,7 @@ static inline s32 amd756_access_channel(
> > >  
> > >   /* We exclude the non-multiplexed addresses */
> > >   if (addr != 0x4c && (addr & 0xfc) != 0x50 && (addr & 0xfc) != 0x30)
> > > -         return -1;
> > > +         return -EINVAL;
> > >  
> > >   mutex_lock(&amd756_lock);
> > 
> > For these two I'd go for -ENODEV as well. It's a little unfair to
> > return -EINVAL, given that we are hiding the devices from the caller on
> > purpose and they couldn't guess.
> 
> That might be a case where -ENXIO could provide a useful distinction.
> "No such device *OR ADDRESS*" vs "No such device".  ;)

What matters here is to return the same code that other bus drivers
return when they don't get a ack on a given transaction. The caller
shouldn't be able to differentiate between both cases, multiplexing
should be completely transparent to the chip drivers.

> > > --- g26.orig/drivers/i2c/busses/i2c-amd756.c      2008-05-01 
> > > 16:01:39.000000000 -0700
> > > +++ g26/drivers/i2c/busses/i2c-amd756.c   2008-05-01 16:02:58.000000000 
> > > -0700
> > > @@ -151,17 +151,17 @@ static int amd756_transaction(struct i2c
> > >   }
> > >  
> > >   if (temp & GS_PRERR_STS) {
> > > -         result = -1;
> > > +         result = -EIO;
> > >           dev_dbg(&adap->dev, "SMBus Protocol error (no response)!\n");
> > 
> > This one would be -ENODEV.
> 
> How are you knowing all these things?  I wouldn't interpret
> "protocol error" as "no device".  And while I do happen to
> have amd756 docs around (board broken though, sigh) I didn't
> dig that info out ... I'd be less concerned with reporting
> needlessly generic (EIO) errors than with wrongly reporting
> ones that are too specific (ENODEV).
> 
> That's kind of a general issue with this patch.  I think my
> first idea was to stop reporting "-EPERM" for everything and
> provide plausible codes, but allow updates later.  I can't
> imagine every fault report, the first time around, will be
> the best one we could be reporting.

I agree that it's not easy for you (or anyone else, including me) to
figure out the right error code all the time, and I am fine with using
-EIO as the default error value when there's no other obvious value to
return.

As for "how do I know": check whether the code is using dev_err() or
dev_dbg(). Each time a driver uses dev_dbg(), chances are good that the
error in question isn't really an error, and we've turned the error
message into a debug message to avoid log fills and user complaints. So
basically, when I see a dev_dbg() in these SMBus "access" functions, I
can be almost sure that the error in question is what that controller
returns when a transaction isn't acked. But that's only a heuristic - I
could be wrong, too.

A variant of this is when a driver uses dev_err() but hides the error
message on SMBus quick command and sometimes on SMBus receive byte as
well. These are the transactions used by i2cdetect and sensors-detect
to probe for I2C devices. When you see this you can be certain that
it's the error that is returned on no-ack.

> > > --- g26.orig/drivers/i2c/busses/i2c-i801.c        2008-05-01 
> > > 16:01:38.000000000 -0700
> > > +++ g26/drivers/i2c/busses/i2c-i801.c     2008-05-01 16:02:58.000000000 
> > > -0700
> 
> Though didn't I see a patch removing this driver?

Hell no. You saw i2c-i810 go away, not i2c-i801. i2c-i801 drives the
Intel mainboards' SMBus, i2c-i810 was a bit-banging driver for the
integrated Intel video chips' DDC and I2C buses. i2c-i801 is there to
stay.

> > >   if (temp & SMBHSTSTS_DEV_ERR) {
> > > -         result = -1;
> > > +         result = -EIO;
> > >           dev_dbg(&I801_dev->dev, "Error: no response!\n");
> > 
> > -ENODEV
> 
> Is that *really* a "no response to address" error?

It uses dev_dbg() ;)

> But there's another issue with that type of code.  This
> particular one looks maybe sane ... in the sense that it
> seems to replace generic fault codes with more specific
> ones (maybe), but I don't think all the I2C adapters were
> doing that.  Some of them looked like they were testing
> the specific faults first, then clobbering them with vague
> generic idicators...

You're right, I have noticed this too... most of these drivers check
for many error conditions and overwrite the error value each time, so
the last error checked wins. We might have to reorder the checks so
that the best error code is kept, but honestly I'm not sure I would be
able to tell right away what the best order would be each time. So I am
fine leaving things as they are for now and fixing them later if they
really need to be.

> > > + if (status)
> > > +         return status;
> > >  
> > >   if (read_write == I2C_SMBUS_READ) {
> > >           len = inb_p(SMBHSTDAT0);
> > >           if (len < 1 || len > I2C_SMBUS_BLOCK_MAX)
> > > -                 return -1;
> > > +                 return -EILSEQ;
> > 
> > Huh, definitely not. This error code has to so with multi-byte
> > character encoding, that's inappropriate here.
> 
> No; most of the errno values get recycled.  "Illegal Byte Sequence"
> is exactly what we got there, in fact!  I know that USB has used
> EILSEQ for some years (indicating a lowlevel bit protocol failure),
> and MMC is using it too.
> 
> So if you want to say "not", it should be for some better reason
> than that...

When calling strerror on EILSEQ, you get: "Invalid or incomplete
multibyte or wide character". This has nothing to do with the error in
question, and that's what I am worried about. I don't know whether it
makes sense or not in the USB and MMC code, but in the I2C/SMBus
context, I'm sure it does not.

> > I'd return -EINVAL, or 
> > -EREMOTEIO if you think EINVAL is unfair for the caller.
> 
> It's not an I/O error; the I/O actually worked, it's the bytes

True...

> that were wrong.  It's not an "argument" provided by the caller.
> So neither of those codes seems appropriate.

The returned value depends on an argument provided by the caller,
though.

> > But most 
> > likely, if the returned length is not within the SMBus specifications,
> > it means that the caller ran an SMBus block transaction at the wrong
> > offset or on a device that doesn't support it, so -EINVAL seems
> > reasonable to me.
> 
> That's only a guess though.  All we *know* is that we saw an
> illegal byte sequence...

That's not an illegal "sequence", sorry. The length byte by itself is
either valid or not valid. What we saw is an illegal byte value.

Yes, -EINVAL is a guess, but according to my experience, this guess
will always be correct. Leaving actual I/O errors aside, there are only
two ways that the SMBus block length byte we receive from an SMBus chip
can be incorrect: either the chip in question is returning invalid data
to a proper request, or the request was incorrect. The former would
mean that the chip is basically unusable, so it's very unlikely to
happen. As a matter of fact, I've never seen it (not that I have seen
that many chips using SMBus block reads, though.) That's why I say that
the latter will always be true in practice, and -EINVAL makes sense
then even if it's not perfect.

If you are still worried that -EINVAL has a small chance of not being
correct, I'd rather go for -EMSGSIZE (Message too long), although the
error message would be confusing in the 0 case. Even -EOVERFLOW (Value
too large for defined data type) would be more appropriate than
-EILSEQ, I think.

-- 
Jean Delvare

_______________________________________________
i2c mailing list
[email protected]
http://lists.lm-sensors.org/mailman/listinfo/i2c

Re: [i2c] [patch 2.6.25-git] i2c_adapters: return -Errno not -1

Reply via email to