On Wed, Sep 06, 2017 at 10:23:37AM +1000, Andrew Jeffery wrote:
> On Tue, 2017-09-05 at 10:00 -0700, Guenter Roeck wrote:
> > On Tue, Sep 05, 2017 at 05:01:32PM +1000, Andrew Jeffery wrote:
> > > Some functions exposed by pmbus core conflated errors that occurred when
> > > setting the page to access with errors that occurred when accessing
> > > registers in a page. In some cases, this caused legitimate errors to be
> > > hidden under the guise of the register not being supported.
> > > 
> > 
> > I'll have to look into this. If I recall correctly, the reason for not 
> > returning
> > errors from the "clear faults" command was that some early chips did not 
> > support
> > it, or did not support it on all pages. Those chips would now always fail.
> 
> Yes, that is a concern.
> 
> However, shouldn't the lack of support for CLEAR_FAULTS be a described
> property of the chip or page?
> 
> Regardless, the issue here is the PAGE command is sometimes failing
> (more below). This means we won't have issued a CLEAR_FAULTS against
> the page even if the page supports CLEAR_FAULTS. That to me is
> something that should be propagated back up the call chain, as faults
> that can be cleared will not have been.
> 
We could also consider
- retry
- Add a marker indicating that faults (specifically command errors)
  are unreliable after clearing faults failed

If we can't reliably clear faults, all bets are off.

> > 
> > On a higher level, I am not sure if it is a good idea to return an error
> > from a function intended to clear faults (and nothing else). That seems
> > counterproductive.
> > 
> > Is this a problem you have actually observed ? 
> 
> Unfortunately yes. I was trying to reproduce some issues that we are
> seeing in userspace but wasn't having much luck (getting -EIO on
> reading a hwmon attribute). I ended up instrumenting our I2C bus
> driver, and found that the problem was more prevalent than what the
> read failures in userspace were indicating. The paths that were
> triggering these unreported conditions all traced through the check
> register and clear fault functions, which on analysis were swallowing
> the error.
> 
> > If so, what is the chip ?
> 
> Well, the chip that I'm *communicating* with is the Maxim MAX31785
> Intelligent Fan Controller. As to whether that's what is *causing* the
> PAGE command to fail is still up in the air. I would not be surprised
> if we have other issues in the hardware design.
> 
> I haven't sent a follow-up to the series introducing the MAX31785
> driver because I've been chasing down communication issues. I felt it
> was important to understand whether the device has quirks that need to
> be worked around in the driver, or if our hardware design has bigger
> problems that should be handled in other ways. I've been in touch with
> Maxim who have asserted that some of the problems we're seeing cannot
> be caused by their chip.
> 

Guess I need to dig up my eval board and see if I can reproduce the problem.
Seems you are saying that the problem is always seen when issuing a sequence
of "clear faults" commands on multiple pages ?

The MAX31785 is microcode based, so I would not be entirely surprised if
it sometimes fails to reply if a sequence of <set page, clear faults>
commands is executed.

If possible, you might try reducing the i2c clock. I have not seen that with
Maxim chips, but some other PMBus chips don't operate reliably at 400 KHz.

Guenter

> > 
> > > > > Signed-off-by: Andrew Jeffery <and...@aj.id.au>
> > > ---
> > >  Documentation/hwmon/pmbus-core   |  12 ++--
> > >  drivers/hwmon/pmbus/pmbus.h      |   6 +-
> > >  drivers/hwmon/pmbus/pmbus_core.c | 115 
> > > +++++++++++++++++++++++++++++----------
> > >  3 files changed, 95 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/Documentation/hwmon/pmbus-core 
> > > b/Documentation/hwmon/pmbus-core
> > > index 8ed10e9ddfb5..3e9f41bb756f 100644
> > > --- a/Documentation/hwmon/pmbus-core
> > > +++ b/Documentation/hwmon/pmbus-core
> > > @@ -218,17 +218,17 @@ Specifically, it provides the following information.
> > >    This function calls the device specific write_byte function if defined.
> > >    Therefore, it must _not_ be called from that function.
> > >  
> > > -  bool pmbus_check_byte_register(struct i2c_client *client, int page, 
> > > int reg);
> > > +  int pmbus_check_byte_register(struct i2c_client *client, int page, int 
> > > reg);
> > >  
> > > -  Check if byte register exists. Return true if the register exists, 
> > > false
> > > -  otherwise.
> > > +  Check if byte register exists. Returns 1 if the register exists, 0 if 
> > > it does
> > > +  not, and less than zero on an unexpected error.
> > >    This function calls the device specific write_byte function if defined 
> > > to
> > >    obtain the chip status. Therefore, it must _not_ be called from that 
> > > function.
> > >  
> > > -  bool pmbus_check_word_register(struct i2c_client *client, int page, 
> > > int reg);
> > > +  int pmbus_check_word_register(struct i2c_client *client, int page, int 
> > > reg);
> > >  
> > > -  Check if word register exists. Return true if the register exists, 
> > > false
> > > -  otherwise.
> > > +  Check if word register exists. Returns 1 if the register exists, 0 if 
> > > it does
> > > +  not, and less than zero on an unexpected error.
> > >    This function calls the device specific write_byte function if defined 
> > > to
> > >    obtain the chip status. Therefore, it must _not_ be called from that 
> > > function.
> > >  
> > > diff --git a/drivers/hwmon/pmbus/pmbus.h b/drivers/hwmon/pmbus/pmbus.h
> > > index bfcb13bae34b..c53032a04a6f 100644
> > > --- a/drivers/hwmon/pmbus/pmbus.h
> > > +++ b/drivers/hwmon/pmbus/pmbus.h
> > > @@ -413,9 +413,9 @@ int pmbus_write_byte_data(struct i2c_client *client, 
> > > int page, u8 reg,
> > > > >                         u8 value);
> > >  int pmbus_update_byte_data(struct i2c_client *client, int page, u8 reg,
> > > > >                          u8 mask, u8 value);
> > > -void pmbus_clear_faults(struct i2c_client *client);
> > > -bool pmbus_check_byte_register(struct i2c_client *client, int page, int 
> > > reg);
> > > -bool pmbus_check_word_register(struct i2c_client *client, int page, int 
> > > reg);
> > > +int pmbus_clear_faults(struct i2c_client *client);
> > > +int pmbus_check_byte_register(struct i2c_client *client, int page, int 
> > > reg);
> > > +int pmbus_check_word_register(struct i2c_client *client, int page, int 
> > > reg);
> > >  int pmbus_do_probe(struct i2c_client *client, const struct i2c_device_id 
> > > *id,
> > > > >                  struct pmbus_driver_info *info);
> > >  int pmbus_do_remove(struct i2c_client *client);
> > > diff --git a/drivers/hwmon/pmbus/pmbus_core.c 
> > > b/drivers/hwmon/pmbus/pmbus_core.c
> > > index f1eff6b6c798..153700e35431 100644
> > > --- a/drivers/hwmon/pmbus/pmbus_core.c
> > > +++ b/drivers/hwmon/pmbus/pmbus_core.c
> > > @@ -304,18 +304,24 @@ static int _pmbus_read_byte_data(struct i2c_client 
> > > *client, int page, int reg)
> > > > >       return pmbus_read_byte_data(client, page, reg);
> > >  }
> > >  
> > > -static void pmbus_clear_fault_page(struct i2c_client *client, int page)
> > > +static int pmbus_clear_fault_page(struct i2c_client *client, int page)
> > >  {
> > > > > -     _pmbus_write_byte(client, page, PMBUS_CLEAR_FAULTS);
> > > > > +     return _pmbus_write_byte(client, page, PMBUS_CLEAR_FAULTS);
> > >  }
> > >  
> > > -void pmbus_clear_faults(struct i2c_client *client)
> > > +int pmbus_clear_faults(struct i2c_client *client)
> > >  {
> > > > >       struct pmbus_data *data = i2c_get_clientdata(client);
> > > > > +     int rv;
> > > > >       int i;
> > >  
> > > > > -     for (i = 0; i < data->info->pages; i++)
> > > > > -             pmbus_clear_fault_page(client, i);
> > > > > +     for (i = 0; i < data->info->pages; i++) {
> > > > > +             rv = pmbus_clear_fault_page(client, i);
> > > > > +             if (rv)
> > > > > +                     return rv;
> > > > > +     }
> > > +
> > > > > +     return 0;
> > >  }
> > >  EXPORT_SYMBOL_GPL(pmbus_clear_faults);
> > >  
> > > @@ -333,28 +339,45 @@ static int pmbus_check_status_cml(struct i2c_client 
> > > *client)
> > > > >       return 0;
> > >  }
> > >  
> > > -static bool pmbus_check_register(struct i2c_client *client,
> > > +static int pmbus_check_register(struct i2c_client *client,
> > > > >                                int (*func)(struct i2c_client *client,
> > > > >                                            int page, int reg),
> > > > >                                int page, int reg)
> > >  {
> > > > > +     struct pmbus_data *data;
> > > > > +     int check;
> > > > >       int rv;
> > > > > -     struct pmbus_data *data = i2c_get_clientdata(client);
> > >  
> > > > > -     rv = func(client, page, reg);
> > > > > -     if (rv >= 0 && !(data->flags & PMBUS_SKIP_STATUS_CHECK))
> > > > > -             rv = pmbus_check_status_cml(client);
> > > > > -     pmbus_clear_fault_page(client, -1);
> > > > > -     return rv >= 0;
> > > > > +     data = i2c_get_clientdata(client);
> > > +
> > > > > +     /*
> > > > > +      * pmbus_set_page() guards transactions on the requested page 
> > > > > matching
> > > > > +      * the current page. This may be done in the execution of 
> > > > > func(), but
> > > > > +      * at that point a set-page error is conflated with accessing a
> > > > > +      * non-existent register.
> > > > > +      */
> > > > > +     rv = pmbus_set_page(client, page);
> > > > > +     if (rv < 0)
> > > > > +             return rv;
> > > +
> > > > > +     check = func(client, page, reg);
> > > > > +     if (check >= 0 && !(data->flags & PMBUS_SKIP_STATUS_CHECK))
> > > > > +             check = pmbus_check_status_cml(client);
> > > +
> > > > > +     rv = pmbus_clear_fault_page(client, -1);
> > > > > +     if (rv < 0)
> > > > > +             return rv;
> > > +
> > > > > +     return check >= 0;
> > >  }
> > >  
> > > -bool pmbus_check_byte_register(struct i2c_client *client, int page, int 
> > > reg)
> > > +int pmbus_check_byte_register(struct i2c_client *client, int page, int 
> > > reg)
> > >  {
> > > > >       return pmbus_check_register(client, _pmbus_read_byte_data, 
> > > > > page, reg);
> > >  }
> > >  EXPORT_SYMBOL_GPL(pmbus_check_byte_register);
> > >  
> > > -bool pmbus_check_word_register(struct i2c_client *client, int page, int 
> > > reg)
> > > +int pmbus_check_word_register(struct i2c_client *client, int page, int 
> > > reg)
> > >  {
> > > > >       return pmbus_check_register(client, _pmbus_read_word_data, 
> > > > > page, reg);
> > >  }
> > > @@ -390,7 +413,7 @@ static struct pmbus_data *pmbus_update_device(struct 
> > > device *dev)
> > >  
> > > > >       mutex_lock(&data->update_lock);
> > > > >       if (time_after(jiffies, data->last_updated + HZ) || 
> > > > > !data->valid) {
> > > > > -             int i, j;
> > > > > +             int i, j, ret;
> > >  
> > > > >               for (i = 0; i < info->pages; i++) {
> > > > >                       data->status[PB_STATUS_BASE + i]
> > > @@ -424,7 +447,13 @@ static struct pmbus_data *pmbus_update_device(struct 
> > > device *dev)
> > > > >                                                           
> > > > > sensor->page,
> > > > >                                                           
> > > > > sensor->reg);
> > > > >               }
> > > > > -             pmbus_clear_faults(client);
> > > +
> > > > > +             ret = pmbus_clear_faults(client);
> > > > > +             if (ret < 0) {
> > > > > +                     mutex_unlock(&data->update_lock);
> > > > > +                     return ERR_PTR(ret);
> > > > > +             }
> > > +
> > > > >               data->last_updated = jiffies;
> > > > >               data->valid = 1;
> > > > >       }
> > > @@ -754,6 +783,9 @@ static ssize_t pmbus_show_boolean(struct device *dev,
> > > > >       struct pmbus_data *data = pmbus_update_device(dev);
> > > > >       int val;
> > >  
> > > > > +     if (IS_ERR(data))
> > > > > +             return PTR_ERR(data);
> > > +
> > > > >       val = pmbus_get_boolean(data, boolean, attr->index);
> > > > >       if (val < 0)
> > > > >               return val;
> > > @@ -766,6 +798,9 @@ static ssize_t pmbus_show_sensor(struct device *dev,
> > > > >       struct pmbus_data *data = pmbus_update_device(dev);
> > > > >       struct pmbus_sensor *sensor = to_pmbus_sensor(devattr);
> > >  
> > > > > +     if (IS_ERR(data))
> > > > > +             return PTR_ERR(data);
> > > +
> > > > >       if (sensor->data < 0)
> > > > >               return sensor->data;
> > >  
> > > @@ -995,7 +1030,11 @@ static int pmbus_add_limit_attrs(struct i2c_client 
> > > *client,
> > > > >       struct pmbus_sensor *curr;
> > >  
> > > > >       for (i = 0; i < nlimit; i++) {
> > > > > -             if (pmbus_check_word_register(client, page, l->reg)) {
> > > > > +             ret = pmbus_check_word_register(client, page, l->reg);
> > > > > +             if (ret < 0)
> > > > > +                     return ret;
> > > +
> > > > > +             if (ret) {
> > > > >                       curr = pmbus_add_sensor(data, name, l->attr, 
> > > > > index,
> > > > >                                               page, l->reg, 
> > > > > attr->class,
> > > > >                                               attr->update || 
> > > > > l->update,
> > > @@ -1041,6 +1080,8 @@ static int pmbus_add_sensor_attrs_one(struct 
> > > i2c_client *client,
> > > > >       if (!base)
> > > > >               return -ENOMEM;
> > > > >       if (attr->sfunc) {
> > > > > +             int check;
> > > +
> > > > >               ret = pmbus_add_limit_attrs(client, data, info, name,
> > > > >                                           index, page, base, attr);
> > > > >               if (ret < 0)
> > > @@ -1050,9 +1091,13 @@ static int pmbus_add_sensor_attrs_one(struct 
> > > i2c_client *client,
> > > > >                * alarm attributes, if there is a global alarm bit, 
> > > > > and if
> > > > >                * the generic status register for this page is 
> > > > > accessible.
> > > > >                */
> > > > > -             if (!ret && attr->gbit &&
> > > > > -                 pmbus_check_byte_register(client, page,
> > > > > -                                           data->status_register)) {
> > > +
> > > > > +             check = pmbus_check_byte_register(client, page,
> > > > > +                                               
> > > > > data->status_register);
> > > > > +             if (check < 0)
> > > > > +                     return check;
> > > +
> > > > > +             if (!ret && attr->gbit && check) {
> > > > >                       ret = pmbus_add_boolean(data, name, "alarm", 
> > > > > index,
> > > > >                                               NULL, NULL,
> > > > >                                               PB_STATUS_BASE + page,
> > > @@ -1604,8 +1649,12 @@ static int pmbus_add_fan_attributes(struct 
> > > i2c_client *client,
> > > > >                       if (!(info->func[page] & pmbus_fan_flags[f]))
> > > > >                               break;
> > >  
> > > > > -                     if (!pmbus_check_word_register(client, page,
> > > > > -                                                    
> > > > > pmbus_fan_registers[f]))
> > > > > +                     ret = pmbus_check_word_register(client, page,
> > > > > +                                                    
> > > > > pmbus_fan_registers[f]);
> > > > > +                     if (ret < 0)
> > > > > +                             return ret;
> > > +
> > > > > +                     if (!ret)
> > > > >                               break;
> > >  
> > > > >                       /*
> > > @@ -1628,9 +1677,13 @@ static int pmbus_add_fan_attributes(struct 
> > > i2c_client *client,
> > > > >                        * Each fan status register covers multiple 
> > > > > fans,
> > > > >                        * so we have to do some magic.
> > > > >                        */
> > > > > -                     if ((info->func[page] & 
> > > > > pmbus_fan_status_flags[f]) &&
> > > > > -                         pmbus_check_byte_register(client,
> > > > > -                                     page, 
> > > > > pmbus_fan_status_registers[f])) {
> > > > > +                     ret =  pmbus_check_byte_register(client, page,
> > > > > +                                             
> > > > > pmbus_fan_status_registers[f]);
> > > > > +                     if (ret < 0)
> > > > > +                             return ret;
> > > +
> > > > > +                     if ((info->func[page] & 
> > > > > pmbus_fan_status_flags[f])
> > > > > +                                     && ret) {
> > > > >                               int base;
> > >  
> > > > > > >                           if (f > 1)      /* fan 3, 4 */
> > > @@ -1696,10 +1749,13 @@ static int pmbus_identify_common(struct 
> > > i2c_client *client,
> > > > >                                struct pmbus_data *data, int page)
> > >  {
> > > > >       int vout_mode = -1;
> > > > > +     int rv;
> > >  
> > > > > -     if (pmbus_check_byte_register(client, page, PMBUS_VOUT_MODE))
> > > > > +     rv = pmbus_check_byte_register(client, page, PMBUS_VOUT_MODE);
> > > > > +     if (rv == 1)
> > > > >               vout_mode = _pmbus_read_byte_data(client, page,
> > > > >                                                 PMBUS_VOUT_MODE);
> > > +
> > > > >       if (vout_mode >= 0 && vout_mode != 0xff) {
> > > > >               /*
> > > > >                * Not all chips support the VOUT_MODE command,
> > > @@ -1725,8 +1781,7 @@ static int pmbus_identify_common(struct i2c_client 
> > > *client,
> > > > >               }
> > > > >       }
> > >  
> > > > > -     pmbus_clear_fault_page(client, page);
> > > > > -     return 0;
> > > > > +     return pmbus_clear_fault_page(client, page);
> > >  }
> > >  
> > >  static int pmbus_init_common(struct i2c_client *client, struct 
> > > pmbus_data *data,
> > > @@ -1756,7 +1811,9 @@ static int pmbus_init_common(struct i2c_client 
> > > *client, struct pmbus_data *data,
> > > > >       if (ret >= 0 && (ret & PB_CAPABILITY_ERROR_CHECK))
> > > > >               client->flags |= I2C_CLIENT_PEC;
> > >  
> > > > > -     pmbus_clear_faults(client);
> > > > > +     ret = pmbus_clear_faults(client);
> > > > > +     if (ret < 0)
> > > > > +             return ret;
> > >  
> > > > >       if (info->identify) {
> > > > >               ret = (*info->identify)(client, info);
> > > -- 
> > > 2.11.0
> > > 


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to