* Alexandre Montplaisir ([email protected]) wrote: > Hi all, > > As we have talked in the past weeks, I've been looking at ways to store > state-related metadata in a way that it can be supplied with > instrumented applications, instead of with trace viewers. > > Here is an overview of what was discussed and what I had in mind so far. > It's a rough draft, and still at a very "brainstorming" stage. > > Feedback/comments very welcome! =) > > > Thanks, > > -- > Alexandre Montplaisir > DORSAL lab, > École Polytechnique de Montréal >
> Request For Comments / Proposal on how to store state-related metadata in > tracepoints > > Alexandre Montplaisir <[email protected]> > > > Trace viewers normally carry their own state machine to represent the state of > traced systems at any given point in a trace. Typically, the definition of > this > state machine was in the viewer itself, and had to be constantly updated > whenever the tracing instrumentation would change. > > It would be interesting if we could provide a basic state machine definition > included with the instrumentation. This would allow viewers to show basic > state > information without having to "know" the type of trace in advance. > > This proposal tries to give an example of how such a state sytem could be > defined in trace points (or referred to by the tracepoints), and what > information would be needed. > > > > Definitions > -------------------------- > > * Attributes > An attribute is a "single element of state", the basic unit, the atom if you > would. Each bit of information we want to store about the state is represented > with an attribute. The idea so far was to organize them in a tree, similar to > the /proc filesystem. > For example: > > host1/CPUs/0/Current_process > host1/Processes/2500/Exec_name > > could be attributes. They would represent, respectively, the current scheduled > process on CPU 0 and the current executable name of process with PID 2500, > both > on host "host1". little point of semantic: CPUs schedule threads, not processes. > > A main point about the design of this "attribute tree" is that it does not > need > to be defined in advance : it should be built on the go, as we read > information > from the trace (e.g. we won't know how many CPUs there will be, etc.) > > > * State values > The goal of the attributes is to store values. Each "state value" is only > valid > for a certain period of time, or "interval". Only one value exists for a given > attribute/timestamp pair, but this value can be different at other times. > > For example, attribute "host1/CPUs/0/Current_process" could have value "1750" > for a given period, which would mean the scheduled process on CPU0 was PID > 1750 > during that time. > > "Null" is also a possible and important state value. It means "there is no > information about this attribute at this time". If a process only lived for > two > minutes in an hour-long trace, everywhere else its attributes will have null > values. > > > > Points of interest > -------------------------- > > * Integer vs Strings state values > The design of the State History so far allows for State values to be either > Integers or variable-length Strings. However, in cases where we have a defined > set of possible values known in advance, it might be interesting to use enum- > like integers instead of strings to save up on storage space. (e.g. system > call > names, IRQ names, etc.) > > One thing to remember in this case is that the "mapping" between the enums and > the integers will have to be known by both the tracer and the analysis tool, > so > this adds a dependency. > (The State History library does not need to know about it though, we can have > it > store any value and it will happily return it without knowing what it means.) One possibility would be to keep one extra type of info: enums would be a ( value , reference to enumeration mapping table ) pair, so that the corresponding string could be extracted from the value without having to keep information about the enumeration mapping table externally. We could even decide to have a whole level of "directories" in the state tree mapped to a single enumeration mapping table, which would apply to all children, so we don't have to repeat the enum table reference. Just food for thoughts. > > > * Events vs. State changes > The goal of adding state metadata to trace points is to map state changes to > events. By definition, a state-changing event will define one *or more* state > changes. All the information required to define these state changes has to be > present locally in the scope of the trace point, or in some cases in the state > history itself. > > For example, a scheduling event could cause the following state changes: > - set the "running" status to the process that got scheduled in again, process -> thread > - set the "preempted" (for example) status to the process that got scheduled > out > - update the "current running process" on the relevant CPU > > When we explicitely express each one of those changes using the attributes and > values we defined earlier, we can also use the term "attribute modifications". > > > * Conditions > It's also interesting to define conditions at which state changes occur. Once > again those conditions can only use information that is either available > locally > or in the state history. > > For example, if we look at the state changes caused by a scheduling event, > shown > at the previous point, we might want to *not* insert state changes when the > previous or next pid is "0", since we do not care about the current status of > "process 0". Why would we skip pid 0 ? It's really important to know when the system is going to execute the idle thread. > > > * Types of state changes > Finally, some events affect the state in more complex ways than direct > attribute > modifications. It usually has something to do with required information that > is > not available locally in the event payload and requires a query on the > history. > > The state history library (for now) provides abstractions for these different > types: > > MODIFY(timestamp, value, attribute) > Bread-and-butter modification method, we insert in the history a state > change > at "timestamp", in which we now assign "value" to the given "attribute". > > REMOVE(timestamp, attribute) > Similar to MODIFY(timestamp, "null", attribute), except we also "nullify" > all > the children of the attribute. A bit like "rm -rf". This is needed in some > cases where we don't know exactly how many children an attribute has. > (e.g. a process dies, we want to remove all of its child-attributes). > > PUSH(timestamp, value, attribute) > POP(timestamp, attribute) > In some cases we are not only interested in the latest value of a given > attribute, but we want to keep a "stack" of previous ones we have seen so > far. > This is the case with process execution modes (nested IRQs and syscalls and > the like). > > INCREMENT(timestamp, attribute) > Sometimes we might just want to increment a counter, without having to keep > an array in memory just to pass values to MODIFY's. The history will look > for > the previous value of this attribute and will insert a change that > increments > the count by 1. > This is particularly useful if we want to store statistics in the history. > > > (This may add unwanted complexity at the "tracer" level though, but I haven't > figured out a way of generating different types of changes other than > declaring > them right from the start.) > > > Examples of the declaration > -------------------------- > > This is an example for a scheduling event. We assume we have local access to > the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the > cpu number on which this event happened. > > > > * Alternative #1: C-like syntax > (omitted semi-colons, strcat's and the like for clarity) > > state_change changes[3] > > /* Set the status of the process scheduled in */ > if ( next_pid != 0 ) { > changes[0].type = MODIFY > changes[0].attribute_name = "<hostname>/Processes/" + next_pid + > "/Status" > changes[0].value = STATE_RUNNING > } > > /* Set the status of the process scheduled out */ > if ( prev_pid != 0 ) { > changes[1].type = MODIFY > changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + > "/Status" > changes[1].value = prev_state > } > > /* Set the current active process on the relevant CPU */ > changes[2].type = MODIFY > changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process" > changes[2].value = next_pid Clean, understandable, although I'm not convinced that the example is well chosen for the pid != 0. > * Alternative #2: XML syntax > > <statechange> > <condition = "next_pid != 0"> > <type = MODIFY> > <attributename> > <external>hostname</external> > <literal>Processes</literal> > <internal>next_pid</internal> > <literal>Status</literal> > </attributename> > <value> > <internal>STATE_RUNNING</internal> > </value> > </statechange> > <statechange> > <condition = "prev_pid != 0"> > <type = MODIFY> > <attributename> > <external>hostname</external> > <literal>Processes</literal> > <internal>prev_pid</internal> > <literal>Status</literal> > </attributename> > <value> > <internal>prev_state</internal> > </value> > </statechange> > <statechange> > <condition = true> <!-- always record this change --> > <type = MODIFY> > <attributename> > <external>hostname</external> > <literal>CPUs</literal> > <internal>cpu</internal> > <literal>Current_process</literal> > </attributename> > <value> > <internal>next_pid</internal> > </value> > </statechange> Hrm, do we really expect people to type this in manually ? ;) > > In both cases, attribute names contain either literal, external or internal > components. "Internal" refer to variables available locally. Literals are > that, > string literals that will be used as-is in the attribute tree. Externals are > placeholder values that the trace reading library and/or the state history > building mechanism will have to replace with the correct value. > > > (Surely there is a lot of shortcomings in these examples right now, but > hopefully they explain what I'm trying to do ;) > > Personnally I find #1 more compact and more readable, but #2 has the advantage > of not having to be in the program itself. Not true. We could parse C-like syntax descriptions provided along with the plugins. We don't have to go with XML for this. A single description format would indeed be better if we can both keep the degree of flexibility required by plugin-provided descriptions and not be too verbose. Thanks, Mathieu > If we want to also support > externally-supplied state machines, having a common syntax is probably a good > thing.) > > > Link with the > State History API > -------------------------- > > First we define what a "state change" is Java-side. > > > enum StateChangeType {MODIFY, REMOVE, PUSH, POP, INC;} > > class StateChange { > StateChangeType type; > String[] attributeName; > int newValue; > long timestamp; > > ... > } > > > And we add a field "stateChanges" to the Events read from the trace. We > suppose > the trace reading library (a.k.a. Matthew's magical box) will fill up this > array > based on the information in the trace point. > > > class Event { > ... > StateChange[] stateChanges; > ... > } > > (We will also need to implement how the parser will replace "external" > placeholder values with real ones taken in the state history built so far) > > > After this, the whole "State Event Handler" mechanism can be replaced with > the > following snippet: > > /* We assume we have the following already defined: > * ts = event.timestamp > * history = reference to the State History interface object > */ > for ( i=0; i < event.stateChanges.length; i++ ) { > StateChange currentChange = event.stateChanges[i]; > > switch ( currentChange.type ) { > case MODIFY: > history.modifyAttribute(ts, > currentChange.newValue, > currentChange.attributeName); > break; > case REMOVE: > history.removeAttribute(ts, currentChange.attributeName); > break; > case PUSH: > history.pushAttribute( ts, > currentChange.newValue, > currentChange.attributeName); > break; > case POP: > history.popAttribute(ts, currentChange.attributeName); > break; > case INC: > history.increment(ts, currentChange.attributeName); > break; > } > } > > > > > _______________________________________________ > ltt-dev mailing list > [email protected] > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
