Hi all, As we have talked in the past weeks, I've been looking at ways to store state-related metadata in a way that it can be supplied with instrumented applications, instead of with trace viewers.
Here is an overview of what was discussed and what I had in mind so far. It's a rough draft, and still at a very "brainstorming" stage. Feedback/comments very welcome! =) Thanks, -- Alexandre Montplaisir DORSAL lab, École Polytechnique de Montréal
Request For Comments / Proposal on how to store state-related metadata in tracepoints Alexandre Montplaisir <[email protected]> Trace viewers normally carry their own state machine to represent the state of traced systems at any given point in a trace. Typically, the definition of this state machine was in the viewer itself, and had to be constantly updated whenever the tracing instrumentation would change. It would be interesting if we could provide a basic state machine definition included with the instrumentation. This would allow viewers to show basic state information without having to "know" the type of trace in advance. This proposal tries to give an example of how such a state sytem could be defined in trace points (or referred to by the tracepoints), and what information would be needed. Definitions -------------------------- * Attributes An attribute is a "single element of state", the basic unit, the atom if you would. Each bit of information we want to store about the state is represented with an attribute. The idea so far was to organize them in a tree, similar to the /proc filesystem. For example: host1/CPUs/0/Current_process host1/Processes/2500/Exec_name could be attributes. They would represent, respectively, the current scheduled process on CPU 0 and the current executable name of process with PID 2500, both on host "host1". A main point about the design of this "attribute tree" is that it does not need to be defined in advance : it should be built on the go, as we read information from the trace (e.g. we won't know how many CPUs there will be, etc.) * State values The goal of the attributes is to store values. Each "state value" is only valid for a certain period of time, or "interval". Only one value exists for a given attribute/timestamp pair, but this value can be different at other times. For example, attribute "host1/CPUs/0/Current_process" could have value "1750" for a given period, which would mean the scheduled process on CPU0 was PID 1750 during that time. "Null" is also a possible and important state value. It means "there is no information about this attribute at this time". If a process only lived for two minutes in an hour-long trace, everywhere else its attributes will have null values. Points of interest -------------------------- * Integer vs Strings state values The design of the State History so far allows for State values to be either Integers or variable-length Strings. However, in cases where we have a defined set of possible values known in advance, it might be interesting to use enum- like integers instead of strings to save up on storage space. (e.g. system call names, IRQ names, etc.) One thing to remember in this case is that the "mapping" between the enums and the integers will have to be known by both the tracer and the analysis tool, so this adds a dependency. (The State History library does not need to know about it though, we can have it store any value and it will happily return it without knowing what it means.) * Events vs. State changes The goal of adding state metadata to trace points is to map state changes to events. By definition, a state-changing event will define one *or more* state changes. All the information required to define these state changes has to be present locally in the scope of the trace point, or in some cases in the state history itself. For example, a scheduling event could cause the following state changes: - set the "running" status to the process that got scheduled in - set the "preempted" (for example) status to the process that got scheduled out - update the "current running process" on the relevant CPU When we explicitely express each one of those changes using the attributes and values we defined earlier, we can also use the term "attribute modifications". * Conditions It's also interesting to define conditions at which state changes occur. Once again those conditions can only use information that is either available locally or in the state history. For example, if we look at the state changes caused by a scheduling event, shown at the previous point, we might want to *not* insert state changes when the previous or next pid is "0", since we do not care about the current status of "process 0". * Types of state changes Finally, some events affect the state in more complex ways than direct attribute modifications. It usually has something to do with required information that is not available locally in the event payload and requires a query on the history. The state history library (for now) provides abstractions for these different types: MODIFY(timestamp, value, attribute) Bread-and-butter modification method, we insert in the history a state change at "timestamp", in which we now assign "value" to the given "attribute". REMOVE(timestamp, attribute) Similar to MODIFY(timestamp, "null", attribute), except we also "nullify" all the children of the attribute. A bit like "rm -rf". This is needed in some cases where we don't know exactly how many children an attribute has. (e.g. a process dies, we want to remove all of its child-attributes). PUSH(timestamp, value, attribute) POP(timestamp, attribute) In some cases we are not only interested in the latest value of a given attribute, but we want to keep a "stack" of previous ones we have seen so far. This is the case with process execution modes (nested IRQs and syscalls and the like). INCREMENT(timestamp, attribute) Sometimes we might just want to increment a counter, without having to keep an array in memory just to pass values to MODIFY's. The history will look for the previous value of this attribute and will insert a change that increments the count by 1. This is particularly useful if we want to store statistics in the history. (This may add unwanted complexity at the "tracer" level though, but I haven't figured out a way of generating different types of changes other than declaring them right from the start.) Examples of the declaration -------------------------- This is an example for a scheduling event. We assume we have local access to the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the cpu number on which this event happened. * Alternative #1: C-like syntax (omitted semi-colons, strcat's and the like for clarity) state_change changes[3] /* Set the status of the process scheduled in */ if ( next_pid != 0 ) { changes[0].type = MODIFY changes[0].attribute_name = "<hostname>/Processes/" + next_pid + "/Status" changes[0].value = STATE_RUNNING } /* Set the status of the process scheduled out */ if ( prev_pid != 0 ) { changes[1].type = MODIFY changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + "/Status" changes[1].value = prev_state } /* Set the current active process on the relevant CPU */ changes[2].type = MODIFY changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process" changes[2].value = next_pid * Alternative #2: XML syntax <statechange> <condition = "next_pid != 0"> <type = MODIFY> <attributename> <external>hostname</external> <literal>Processes</literal> <internal>next_pid</internal> <literal>Status</literal> </attributename> <value> <internal>STATE_RUNNING</internal> </value> </statechange> <statechange> <condition = "prev_pid != 0"> <type = MODIFY> <attributename> <external>hostname</external> <literal>Processes</literal> <internal>prev_pid</internal> <literal>Status</literal> </attributename> <value> <internal>prev_state</internal> </value> </statechange> <statechange> <condition = true> <!-- always record this change --> <type = MODIFY> <attributename> <external>hostname</external> <literal>CPUs</literal> <internal>cpu</internal> <literal>Current_process</literal> </attributename> <value> <internal>next_pid</internal> </value> </statechange> In both cases, attribute names contain either literal, external or internal components. "Internal" refer to variables available locally. Literals are that, string literals that will be used as-is in the attribute tree. Externals are placeholder values that the trace reading library and/or the state history building mechanism will have to replace with the correct value. (Surely there is a lot of shortcomings in these examples right now, but hopefully they explain what I'm trying to do ;) Personnally I find #1 more compact and more readable, but #2 has the advantage of not having to be in the program itself. If we want to also support externally-supplied state machines, having a common syntax is probably a good thing.) Link with the State History API -------------------------- First we define what a "state change" is Java-side. enum StateChangeType {MODIFY, REMOVE, PUSH, POP, INC;} class StateChange { StateChangeType type; String[] attributeName; int newValue; long timestamp; ... } And we add a field "stateChanges" to the Events read from the trace. We suppose the trace reading library (a.k.a. Matthew's magical box) will fill up this array based on the information in the trace point. class Event { ... StateChange[] stateChanges; ... } (We will also need to implement how the parser will replace "external" placeholder values with real ones taken in the state history built so far) After this, the whole "State Event Handler" mechanism can be replaced with the following snippet: /* We assume we have the following already defined: * ts = event.timestamp * history = reference to the State History interface object */ for ( i=0; i < event.stateChanges.length; i++ ) { StateChange currentChange = event.stateChanges[i]; switch ( currentChange.type ) { case MODIFY: history.modifyAttribute(ts, currentChange.newValue, currentChange.attributeName); break; case REMOVE: history.removeAttribute(ts, currentChange.attributeName); break; case PUSH: history.pushAttribute( ts, currentChange.newValue, currentChange.attributeName); break; case POP: history.popAttribute(ts, currentChange.attributeName); break; case INC: history.increment(ts, currentChange.attributeName); break; } }
_______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
