Hi all,

As we have talked in the past weeks, I've been looking at ways to store
state-related metadata in a way that it can be supplied with
instrumented applications, instead of with trace viewers.

Here is an overview of what was discussed and what I had in mind so far.
It's a rough draft, and still at a very "brainstorming" stage.

Feedback/comments very welcome! =)


Thanks,

-- 
Alexandre Montplaisir
DORSAL lab,
École Polytechnique de Montréal

Request For Comments / Proposal on how to store state-related metadata in 
tracepoints

Alexandre Montplaisir <[email protected]>


Trace viewers normally carry their own state machine to represent the state of
traced systems at any given point in a trace. Typically, the definition of this
state machine was in the viewer itself, and had to be constantly updated
whenever the tracing instrumentation would change.

It would be interesting if we could provide a basic state machine definition
included with the instrumentation. This would allow viewers to show basic state
information without having to "know" the type of trace in advance.

This proposal tries to give an example of how such a state sytem could be
defined in trace points (or referred to by the tracepoints), and what
information would be needed.



       Definitions
--------------------------

* Attributes
An attribute is a "single element of state", the basic unit, the atom if you
would. Each bit of information we want to store about the state is represented
with an attribute. The idea so far was to organize them in a tree, similar to
the /proc filesystem.
For example:

host1/CPUs/0/Current_process
host1/Processes/2500/Exec_name

could be attributes. They would represent, respectively, the current scheduled
process on CPU 0 and the current executable name of process with PID 2500, both
on host "host1".

A main point about the design of this "attribute tree" is that it does not need
to be defined in advance : it should be built on the go, as we read information
from the trace (e.g. we won't know how many CPUs there will be, etc.)


* State values
The goal of the attributes is to store values. Each "state value" is only valid
for a certain period of time, or "interval". Only one value exists for a given
attribute/timestamp pair, but this value can be different at other times.

For example, attribute "host1/CPUs/0/Current_process" could have value "1750"
for a given period, which would mean the scheduled process on CPU0 was PID 1750
during that time.

"Null" is also a possible and important state value. It means "there is no
information about this attribute at this time". If a process only lived for two
minutes in an hour-long trace, everywhere else its attributes will have null
values.



    Points of interest
--------------------------

* Integer vs Strings state values
The design of the State History so far allows for State values to be either
Integers or variable-length Strings. However, in cases where we have a defined
set of possible values known in advance, it might be interesting to use enum-
like integers instead of strings to save up on storage space. (e.g. system call
names, IRQ names, etc.)

One thing to remember in this case is that the "mapping" between the enums and
the integers will have to be known by both the tracer and the analysis tool, so
this adds a dependency.
(The State History library does not need to know about it though, we can have it
store any value and it will happily return it without knowing what it means.)


* Events vs. State changes
The goal of adding state metadata to trace points is to map state changes to
events. By definition, a state-changing event will define one *or more* state
changes. All the information required to define these state changes has to be
present locally in the scope of the trace point, or in some cases in the state
history itself.

For example, a scheduling event could cause the following state changes:
- set the "running" status to the process that got scheduled in
- set the "preempted" (for example) status to the process that got scheduled out
- update the "current running process" on the relevant CPU

When we explicitely express each one of those changes using the attributes and
values we defined earlier, we can also use the term "attribute modifications".


* Conditions
It's also interesting to define conditions at which state changes occur. Once
again those conditions can only use information that is either available locally
or in the state history.

For example, if we look at the state changes caused by a scheduling event, shown
at the previous point, we might want to *not* insert state changes when the
previous or next pid is "0", since we do not care about the current status of
"process 0".


* Types of state changes
Finally, some events affect the state in more complex ways than direct attribute
modifications. It usually has something to do with required information that is
not available locally in the event payload and requires a query on the history.

The state history library (for now) provides abstractions for these different
types:

  MODIFY(timestamp, value, attribute)
  Bread-and-butter modification method, we insert in the history a state change
  at "timestamp", in which we now assign "value" to the given "attribute".
  
  REMOVE(timestamp, attribute)
  Similar to MODIFY(timestamp, "null", attribute), except we also "nullify" all
  the children of the attribute. A bit like "rm -rf". This is needed in some
  cases where we don't know exactly how many children an attribute has.
  (e.g. a process dies, we want to remove all of its child-attributes).
  
  PUSH(timestamp, value, attribute)
  POP(timestamp, attribute)
  In some cases we are not only interested in the latest value of a given
  attribute, but we want to keep a "stack" of previous ones we have seen so far.
  This is the case with process execution modes (nested IRQs and syscalls and 
  the like).
  
  INCREMENT(timestamp, attribute)
  Sometimes we might just want to increment a counter, without having to keep
  an array in memory just to pass values to MODIFY's. The history will look for
  the previous value of this attribute and will insert a change that increments
  the count by 1.
  This is particularly useful if we want to store statistics in the history.


(This may add unwanted complexity at the "tracer" level though, but I haven't
figured out a way of generating different types of changes other than declaring
them right from the start.)


Examples of the declaration
--------------------------

This is an example for a scheduling event. We assume we have local access to
the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the
cpu number on which this event happened.



* Alternative #1:  C-like syntax
(omitted semi-colons, strcat's and the like for clarity)

state_change changes[3]

/* Set the status of the process scheduled in */
if ( next_pid != 0 ) {
        changes[0].type = MODIFY
        changes[0].attribute_name = "<hostname>/Processes/" + next_pid + 
"/Status"
        changes[0].value = STATE_RUNNING
}

/* Set the status of the process scheduled out */
if ( prev_pid != 0 ) {
        changes[1].type = MODIFY
        changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + 
"/Status"
        changes[1].value = prev_state
}

/* Set the current active process on the relevant CPU */
changes[2].type = MODIFY
changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process"
changes[2].value = next_pid




* Alternative #2:  XML syntax

<statechange>
        <condition = "next_pid != 0">
        <type = MODIFY>
        <attributename>
                <external>hostname</external>
                <literal>Processes</literal>
                <internal>next_pid</internal>
                <literal>Status</literal>
        </attributename>
        <value>
                <internal>STATE_RUNNING</internal>
        </value>
</statechange>
<statechange>
        <condition = "prev_pid != 0">
        <type = MODIFY>
        <attributename>
                <external>hostname</external>
                <literal>Processes</literal>
                <internal>prev_pid</internal>
                <literal>Status</literal>
        </attributename>
        <value>
                <internal>prev_state</internal>
        </value>
</statechange>
<statechange>
        <condition = true>      <!-- always record this change -->
        <type = MODIFY>
        <attributename>
                <external>hostname</external>
                <literal>CPUs</literal>
                <internal>cpu</internal>
                <literal>Current_process</literal>
        </attributename>
        <value>
                <internal>next_pid</internal>
        </value>
</statechange>



In both cases, attribute names contain either literal, external or internal
components. "Internal" refer to variables available locally. Literals are that,
string literals that will be used as-is in the attribute tree. Externals are
placeholder values that the trace reading library and/or the state history
building mechanism will have to replace with the correct value.


(Surely there is a lot of shortcomings in these examples right now, but
hopefully they explain what I'm trying to do ;)

Personnally I find #1 more compact and more readable, but #2 has the advantage
of not having to be in the program itself. If we want to also support 
externally-supplied state machines, having a common syntax is probably a good
thing.)


      Link with the
    State History API
--------------------------

First we define what a "state change" is Java-side.


enum StateChangeType {MODIFY, REMOVE, PUSH, POP, INC;}

class StateChange {
        StateChangeType type;
        String[] attributeName;
        int newValue;
        long timestamp;

        ...
}


And we add a field "stateChanges" to the Events read from the trace. We suppose
the trace reading library (a.k.a. Matthew's magical box) will fill up this array
based on the information in the trace point.


class Event {
        ...
        StateChange[] stateChanges;
        ...
}

(We will also need to implement how the parser will replace "external" 
placeholder values with real ones taken in the state history built so far)


After this, the whole "State Event Handler" mechanism can be replaced with the 
following snippet:

/* We assume we have the following already defined:
 * ts = event.timestamp
 * history = reference to the State History interface object
 */
for ( i=0; i < event.stateChanges.length; i++ ) {
        StateChange currentChange = event.stateChanges[i];
        
        switch ( currentChange.type ) {
        case MODIFY:
                history.modifyAttribute(ts,
                                        currentChange.newValue,
                                        currentChange.attributeName);
                break;
        case REMOVE:
                history.removeAttribute(ts, currentChange.attributeName);
                break;
        case PUSH:
                history.pushAttribute(  ts,
                                        currentChange.newValue,
                                        currentChange.attributeName);
                break;
        case POP:
                history.popAttribute(ts, currentChange.attributeName);
                break;
        case INC:
                history.increment(ts, currentChange.attributeName);
                break;
        }
}




_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to