Hidetoshi Seto wrote:
> Russ Anderson wrote:
> > @@ -688,11 +690,11 @@ recover_from_processor_error(int platfor
> > * The cache check and bus check bits have four possible states
> > * cc bc
> > * 0 0 Weird record, not recovered
> > - * 1 0 Cache error, not recovered
> > + * 1 0 Cache error, attempt recovered
> > * 0 1 I/O error, attempt recovery
> > * 1 1 Memory error, attempt recovery
> > */
>
> Which is right, attempt-"recovered" or "recovery"?
>
> > - if (psp->bc == 0 || pbci == NULL)
> > + if (psp->cc == 0 && (psp->bc == 0 || pbci == NULL))
> > return fatal_mca("No bus check");
>
> The message should be replaced by more appropriate one...
> "No recoverable check" or just "Weird record"?
>
> And also there are some comments need to be fixed since this
> patch makes it incorrect, ex.
>
> > /*
> > * Well, here is only one bus error.
> > */
>
> Thanks,
> H.Seto
Here is an updated patch with comment clean up.
[patch] Cache error recovery
Similar to memory error recovery, when a cache error is consumed
by a user process terminate the user instead of crashing the system.
Signed-off-by: Russ Anderson ([EMAIL PROTECTED])
---
arch/ia64/kernel/mca_drv.c | 32 +++++++++++---------------------
1 file changed, 11 insertions(+), 21 deletions(-)
Index: test/arch/ia64/kernel/mca_drv.c
===================================================================
--- test.orig/arch/ia64/kernel/mca_drv.c 2006-12-19 10:28:36.000000000
-0600
+++ test/arch/ia64/kernel/mca_drv.c 2006-12-20 11:16:19.091608933 -0600
@@ -602,6 +602,8 @@ recover_from_platform_error(slidx_table_
default:
break;
}
+ } else if (psp->cc && !psp->bc) { /* Cache error */
+ status = recover_from_read_error(slidx, peidx, pbci, sos);
}
return status;
@@ -645,13 +647,6 @@ recover_from_tlb_check(peidx_table_t *pe
* Return value:
* 1 on Success / 0 on Failure
*/
-/*
- * Later we try to recover when below all conditions are satisfied.
- * 1. Only one processor error section is exist.
- * 2. BUS_CHECK is exist and the others are not exist.(Except TLB_CHECK)
- * 3. The entry of BUS_CHECK_INFO is 1.
- * 4. "External bus error" flag is set and the others are not set.
- */
static int
recover_from_processor_error(int platform, slidx_table_t *slidx,
@@ -687,36 +682,31 @@ recover_from_processor_error(int platfor
/*
* The cache check and bus check bits have four possible states
* cc bc
- * 0 0 Weird record, not recovered
- * 1 0 Cache error, not recovered
- * 0 1 I/O error, attempt recovery
* 1 1 Memory error, attempt recovery
+ * 1 0 Cache error, attempt recovery
+ * 0 1 I/O error, attempt recovery
+ * 0 0 Other error type, not recovered
*/
- if (psp->bc == 0 || pbci == NULL)
- return fatal_mca("No bus check");
+ if (psp->cc == 0 && (psp->bc == 0 || pbci == NULL))
+ return fatal_mca("No cache or bus check");
/*
- * Sorry, we cannot handle so many.
+ * Cannot handle more than one bus check.
*/
if (peidx_bus_check_num(peidx) > 1)
return fatal_mca("Too many bus checks");
- /*
- * Well, here is only one bus error.
- */
+
if (pbci->ib)
return fatal_mca("Internal Bus error");
- if (pbci->cc)
- return fatal_mca("Cache-cache error");
if (pbci->eb && pbci->bsi > 0)
return fatal_mca("External bus check fatal status");
/*
- * This is a local MCA and estimated as recoverble external bus error.
- * (e.g. a load from poisoned memory)
- * This means "there are some platform errors".
+ * This is a local MCA and estimated as a recoverble error.
*/
if (platform)
return recover_from_platform_error(slidx, peidx, pbci, sos);
+
/*
* On account of strange SAL error record, we cannot recover.
*/
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html