The grain in edac is defined as "minimum granularity for an error
report, in bytes". The following calculation of the grain_bits in
edac_mc is wrong:

        grain_bits = fls_long(e->grain) + 1;

Where grain_bits is defined as:

        grain = 1 << grain_bits

Example:

        grain = 8       # 64 bit (8 bytes)
        grain_bits = fls_long(8) + 1
        grain_bits = 4 + 1 = 5

        grain = 1 << grain_bits
        grain = 1 << 5 = 32

Replacing it with the correct calculation:

        grain_bits = fls_long(e->grain - 1);

The example gives now:

        grain_bits = fls_long(8 - 1)
        grain_bits = fls_long(8 - 1)
        grain_bits = 3

        grain = 1 << 3 = 8

Note: We need to check if the hardware reports a reasonable grain != 0
and fallback with a warn_once and 1 byte granularity otherwise.

Signed-off-by: Robert Richter <[email protected]>
---
 drivers/edac/edac_mc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 64922c8fa7e3..45cac74ab833 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -1235,9 +1235,15 @@ void edac_mc_handle_error(const enum 
hw_event_mc_err_type type,
        if (p > e->location)
                *(p - 1) = '\0';
 
-       /* Report the error via the trace interface */
-       grain_bits = fls_long(e->grain) + 1;
+       /*
+        * We expect the hw to report a reasonable grain, fallback to
+        * 1 byte granularity otherwise.
+        */
+       if (WARN_ON_ONCE(!e->grain))
+               e->grain = 1;
+       grain_bits = fls_long(e->grain - 1);
 
+       /* Report the error via the trace interface */
        if (IS_ENABLED(CONFIG_RAS))
                trace_mc_event(type, e->msg, e->label, e->error_count,
                               mci->mc_idx, e->top_layer, e->mid_layer,
-- 
2.20.1

Reply via email to