Re: [PATCH v1 2/5] cpukit: Add Exception Manager

Kinsey Moore Tue, 31 Aug 2021 19:24:00 -0700

On 8/31/2021 17:50, Chris Johns wrote:

On 31/8/21 11:35 pm, Kinsey Moore wrote:

On 8/31/2021 04:31, Sebastian Huber wrote:

On 30/08/2021 17:13, Kinsey Moore wrote:

On 8/30/2021 07:50, Sebastian Huber wrote:

On 30/08/2021 14:27, Kinsey Moore wrote:

On 8/30/2021 00:42, Sebastian Huber wrote:

Hello Kinsey,


why can't you use the existing fatal error extension for this? You just
have to test for an RTEMS_FATAL_SOURCE_EXTENSION source.  The fatal code
is a pointer to the exception frame.

Unfortunately, the fatal error extensions framework necessarily assumes
that the exception is fatal and so does not include the machinery to
perform a thread dispatch or restore the exception frame for additional
execution. It could theoretically be done in the fatal error extensions
context, but it would end up being reimplemented for every architecture and
you'd have to unwind the stack manually. I'm sure there are other ragged
edges that would have to be smoothed over as well.

Non-interrupt exceptions are not uniformly handled across architectures in
RTEMS currently. Adding the RTEMS_FATAL_SOURCE_EXTENSION fatal source was an
attempt to do this. I am not that fond of adding a second approach unless
there are strong technical reasons to do this.

This was in an effort to formalize how recoverable exceptions are handled.
Currently, it's done on on SPARC by handling exception traps as you would an
interrupt trap since they share a common architecture on that platform. This
representation varies quite a bit among platforms, so we needed a different
mechanism.

I recently changed the non-interrupt exception handling on sparc, since it was
not robust against a corrupt stack pointer:

http://devel.rtems.org/ticket/4459

The initial fatal extensions are quite robust, you only need a stack, valid
read-only data and a valid code. So, using a user extension is the right
thing to do, but I don't thing we need a new one.

Doing the non-interrupt exception processing on the stack which caused the
exception is a bit problematic, since the stack pointer might be corrupt as
well. It is more robust to switch to for example the interrupt stack. If the
exception was caused by an interrupt, then this exception is not recoverable.

The non-interrupt exception processing occurs on the interrupt stack, not the
thread/user stack. In the AArch64 support code provided, the stack is
switched back to the thread/user stack before thread dispatch and exception
frame restoration occurs.

You can only switch back to the thread stack if it is valid. Doing a thread
dispatch should be only done if you are sure that the system state is still
intact. This is probably no the case for most exceptions.

If the handler has declared that it handled the exception and corrected the
cause underlying the exception then the system state should be valid. If it
can't make that claim then it should not have handled the exception.

There are valid use cases for this such as an unhandled fault in a task
suspending it. Till did this for EPICS years ago.

Our job is to provide the support to implement system level requirements and we
should leave the analysis and discussions about being valid to our users.

Libdebugger is a good example of a system that handles faults of all types and
RTEMS keeps running without an issue.

If the non-interrupt exception was caused by a thread, then you could do
some high level actions for some exceptions, such as floating-point
exceptions and arithmetic exceptions. If you get a data abort or instruction
error, then it is probably better to terminate the system.

I leave that decision to the handlers defined on this framework. In the case
of the exception-to-signal mapping, I'm carrying over the existing exception
set from the SPARC implementation.

It is probably this code:

+    case EXCEPTION_DATA_ABORT_READ:
+    case EXCEPTION_DATA_ABORT_WRITE:
+    case EXCEPTION_DATA_ABORT_UNSPECIFIED:
+    case EXCEPTION_INSTRUCTION_ABORT:
+    case EXCEPTION_MMU_UNSPECIFIED:
+    case EXCEPTION_ACCESS_ALIGNMENT:
+      signal = SIGSEGV;
+      break;
+
+    default:
+      /*
+       * Covers unknown, PC/SP alignment, illegal execution state, and any new
+       * exception classes that get added.
+       */
+      signal = SIGILL;
+      break;
+  }

Using signals to handle these exceptions is like playing Russian roulette.

You're right. Specifically, SP alignment faults should be moved to the
not-handled section because they're not actually handled here and would have to
be to proceed with further execution. I'll make that change, thanks.

Non-interrupt exception handling is always architecture-dependent. It is

just a matter how you organize it. In general, the most sensible way to deal
with non-interrupt exceptions is to log the error somehow and terminate the
system. The mapping to signals is a bit of a special case if you ask me. My
preferred way to handle non-interrupt exceptions would be to

1. switch to a dedicated stack

2. save the complete register set to the CPU exception frame

3. call the fatal error extensions with RTEMS_FATAL_SOURCE_EXTENSION and the
CPU exception frame (with interrupts disabled)

Add a new API to query/alter the CPU exception frame, switch to the stack
indicated by the CPU exception frame, and restore the context stored in the
CPU exception frame. With these architecture-dependent CPU exception frame
support it should be possible to implement a high level mapper to signals.

What you've described is basically what is happening here (the dedicated
stack is currently the interrupt/exception stack on AArch64), but the low
level details are necessarily contained within the CPU port in patch 3/5.
Support for this framework is not required for any CPU port, but CPU ports
that do support it repurpose the existing code underlying the fatal error
extensions with the additional support you described above.

I don't think that looking at existing code is the right thing to do. The
exception handling is too diverse in RTEMS. We should think about how a clean
design should look like.

I repurposed the existing code in the AArch64 CPU port because it happened to do
part of what was needed as you listed just above. This may not be a perfectly
clean design, but it's cleaner than what currently exists for recoverably
handling machine exceptions. What currently exists is: hooking the exception
vector(s) with one-off assembly for each platform and exception type.

Yes it is a low level interface. My concern is creating a piece meal outcome, a
sort of we have done what we need and other will need to do the same.

I agree and the avoidance of piecemeal solutions is a primary goal ofthis patch set. This should be usable by any consumer of exceptionsgoing forward. Libdebugger, signal mapping, and, to an extent, fatalerror extensions are all primary consumers of this functionality. Fatalerror extensions will, of course, not use the majority of functionalityprovided.

This does not exist in parallel to the fatal error extensions, but rather the

fatal error extensions are moved on top of the Exception Manager for CPU
ports that support it. The Exception Manager returns whether the exception
was handled and the CPU port then calls the fatal error extensions if the
exception wasn't handled. With this patch set, only an accessor was added to
get the exception class, but my initial thoughts included manipulation of the
execution address and several other more generic manipulators.

If a non-interrupt exception occurs, the default behaviour should be to
terminate the system as robust and save as possible. Raising signal should be
optional and not make the exception handling less robust.

Agreed.

Also enabling signals anywhere is an issue if libbsd is used as it does not
support them. Doing so would create hard to find holes in a complex system.

The support for the
signals should also not lead to dead code in the default case. This is why I
proposed a two step approach. The first step is a normal fatal error handler.
The second step is a resume of normal multitasking in a special signal fatal
error extension using an architecture-specific "jump" which is defined by the
CPU exception frame.

I support this approach. I see this as basically bringing parts of the
libdebugger back end into score/cpu and I welcome that.

This is definitely the long-term goal.

I suggest the ARM architecture be considered first because it is the hardest. If
you can solve this one I suspect we will have a suitable interface and
structure. The ARM is complex because of the mix of exceptions, the various mix
of control registers, the exception space verses the task space and finally ARM
and thumb modes.

Once I've finished with the current AArch64 SMP work, my next focus islibdebugger for AArch64 which shares many properties with ARM. Theimportant one that it lacks is a Thumb-equivalent.

I don't think I understand how a signal could be sent to the runtime while
simultaneously shutting down the system since system shutdown would necessarily
occur before the signal could be sent in thread dispatch.

My understanding is the shutdown would be default outcome for any exception, ie
what we have now. If you catch those exceptions, transition back to the thread
arena and raise a signal there would be no shutdown.

This assumption is not changed by this patch set. Unhandled exceptionsstill result in system shutdown.

As things are currently setup, the signal mapping hook is only installed if the
application specifically requests it and is off by default. The average
application will see no change to exception handling since it does not request
the mapping and there are no default recoverable exception handlers.

Where does this leave libdebugger? Have you tested the integration of this
signal functionality with libdebugger?

AArch64 libdebugger support is my next target for AArch64. As yet it isuntested, but should be proven out in the very near term.

In which context does a signal handler run? Is it like an ASR or is it running
in the context of the exception?

Currently, the exception handler runs in the exception context and thesignal handler runs in the thread context during thread dispatch as partof the post-switch extensions, iirc.



Kinsey

_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: [PATCH v1 2/5] cpukit: Add Exception Manager

Reply via email to