Hello all,

I have implemented some of the ideas discussed here, mostly as a proof of
concept.
The idea was to see where were the major hurdles in implementing more
advanced statistics in the current framework.

For simplicity, I have only implemented the cpu usage of every TID for each
CPU.

For storage, I used a separate state system than the one used by the
current statistics.
The attribute tree is first separated by CPU and then by TID.
The TID nodes contain the cumulative time spent by this TID on the parent
CPU.
The TID nodes then contain an additional child node to keep track of the
time spent only in the current interval.

Here is a timeline of events and how the corresponding intervals relate to
that timeline.

t0: TID1000 scheduled in CPU1
t1: TID1000 scheduled out of CPU1
t5: TID1000 scheduled in CPU1
t7: TID1000 scheduled out of CPU1
t9: end of trace

>From t0 to t5: cumulative time for (CPU1,TID1000) = 1; current time for
(CPU1, TID1000) = 1;
>From t5 to t9: cumulative time for (CPU1,TID1000) = 3; current time for
(CPU1, TID1000) = 2;

To know the exact cumulative time at t6, you must first query the raw
cumulative time (3 in this case).
We know this TID ran for 2 units of time in this interval but only 1 has
passed since the beginning of the interval.
Therefore, we must subtract 1 unit of time from the raw total to get the
exact total at t6 (2).

My first intuition was that the current running time is information that is
already stored in the kernel state system used by the control flow view.
Two cases can happen :
1. The query is during the execution of the process.
2. The query is outside the execution of the process.

In the first case, we must do interpolation to have the correct value. By
querying the kernel state system, we can make sure that this is the case.
The kernel state system also allows us to know how long the process has
been executing and do the interpolation.

In the second case, we do not have to do interpolation. By querying the
kernel state system, we find that the process is not executing and that the
cumulative time can be used as is.

By using the kernel state system, we have effectively halved the amount of
information we have to store in the statistics state system.
By doing so, we have also heavily coupled the kernel state system and the
statistics state system. If a new event is added to the kernel state
provider, the statistics provider must be aware of that change if the
statistics must remain coherent. At the moment, this is not too problematic
as only the sched_switch event provokes the state change used by the
statistics.
Still, the compromise between file size and the ease of maintenance is an
interesting problem.

Here are some history tree file sizes I measured with the statistics state
system using the current running time (without optimization)
Trace 1 : 286 MB
CPU usage stats history : 53 MB
event count stats history: 572 MB
kernel state history : 578 MB

Trace 2 : 10.6 MB
CPU usage stats history : 2.24 MB
event count stats history: 17.6 MB
kernel state history : 12.2 MB

Another minor problem is the storage of the cumulative time. The current
history implementation only supports 32-bit integers and strings.
Cumulative time can easily overflow an integer in long enough traces.
I have not tried implementing 64-bit integer support. But according to my
experience with the history tree format, this should not be very hard to
do. In my implementation, overflows will occur without warning.

The second part I worked on was the integration of this new information to
the existing statistics view.
Unfortunately, the current implementation of the statistics is very
specific to the showing of event counts.

I still wanted to have meaningful information displayed, so I worked around
the limitations.
Only longs can be displayed in the view, so instead of cpu usage shown in
percentage, I have elected to display the raw count in nanoseconds.
Also, there was no way to specify an organization for the statistics. The
flat architecture of the event counts was pretty much hard-coded.
Anyway, I feel like this part would require some work before it is
integrated seamlessly in the current view.

The code can be found here :
http://git.dorsal.polymtl.ca/~frajotte?p=org.eclipse.linuxtools.git;a=shortlog;h=refs/heads/cpustats

The lightweight version that uses the kernel state system can be found here:
http://git.dorsal.polymtl.ca/~frajotte?p=org.eclipse.linuxtools.git;a=shortlog;h=refs/heads/cpustatslight

Please keep in mind that this was mostly done as a proof of concept to show
that it was possible. There are some hard-coded strings that should be
refactored and no unit tests.
I would love to have your feedback. I can also commit this on gerrit if you
would prefer to review this contribution there.

On Tue, Feb 12, 2013 at 5:04 PM, Alexandre Montplaisir <
alexmon...@voxpopuli.im> wrote:

> On 13-02-12 04:45 PM, François Rajotte wrote:
> >
> > I believe the main concern was if we want the information about _many_
> > processes at a given timestamp.
> > If only one query is necessary (Michel's proposal), then we are sure
> > to have all the information in one query. (constant in the number of
> > processes)
> > But if we require to go back in time because we fall on a null
> > interval, the second query is at a different time for each process.
> > (linear in the number of processes)
> > The number of queries is now proportional to the number of processes
> > and not constant.
>
> Ah ok see what you mean. There is another advantage in querying at the
> same timestamp : if we do a full query, we can get both/all values for
> no additional cost.
>
> Note that with a partial history, all queries are full queries anyway
> (full, partial, this is getting confusing :P )
> We can implement a single query to fill the API, but in background it
> would do a full query.
> By the way, scratch what I said earlier about not being able to use
> partial histories for this ; both algorithms we are talking about here
> would work fine with a partial history, we don't need the end times of
> the intervals.
>
> >
> > (see Michel's response)
> >
> >
> >     >> One possible alternative is to store the cumulative CPU time in
> >     one attribute and the entryTime for the current interval if
> >     scheduled in and thus ongoing (or NULL if scheduled out). This
> >     would be 2 attributes instead of 1 in the current state, 1 change
> >     at schedule in and 2 at schedule out (thus 3 changes instead of 2
> >     in the history). However, you would not need any of the additional
> >     queries and there should be no problem with partial history
> >     storage optimizations.
> >     > Yes, this is an interesting idea. About the partial history vs full
> >     > history, this is something where partial history IMO is not at all
> >     > beneficial since the intervals are large and the changing events
> >     are few
> >     > and far between, relatively on the kernel front.  this state system
> >     > takes (empirically) approx 10 mb, 20 mb for the new system for
> >     every gb
> >     > of the full state system, so trying compression to save space
> >     here is
> >     > like trying to balance the US economy by cutting PBS's funding.
> Alex
> >     > will probably disagree with me here.
> >     >
> >     > With very very large traces, I can't tell if this state system
> >     will be
> >     > larger than say a partial stats history tree. I think some
> >     investigation
> >     > is needed.
> >
> >     If you can fit it in a partial history, it will most likely be
> smaller
> >     than a full history, unless you use 1000 times more attributes ;)
> >
> >     An interesting point with the proposed method is that we already
> store
> >     the status of each process in the standard kernel state system,
> >     that can
> >     indicate if the process was on or off the CPU at any given time. So
> we
> >     could avoid duplicating this information.
> >
> >
> > I had the same idea. While trying to understand Michel proposal, I
> > noticed that the information we would want to store (process schedule
> > in and out) is already in the kernel state system.
> > If we could somehow link the two state systems together, we could
> > reuse that information and save space.
>
> This information could go in a second, LTTng-kernel specific state
> system, with which the standard kernel state system is guaranteed to
> exist. Or it could go in the same one. Probably the same one, if you
> want to benefit from full queries.
>
> >
> >
> >     I still have to wrap my head around it (I'm not sure if it implies
> >     using
> >     back-assignments or not), but it's definitely good to experiment.
> >
> >
>
> 30 mins on the drawing board later, I have to say I really like this new
> proposed algorithm. We never have to "seek back", and seeking back with
> a partial history is very costly (you have to re-read from the last
> checkpoint). Especially during the construction... There is no
> back-assignment needed either. We need to keep in RAM the current
> cumulative CPU time for each process, but that's not a problem.
> _______________________________________________
> linuxtools-dev mailing list
> linuxtools-dev@eclipse.org
> https://dev.eclipse.org/mailman/listinfo/linuxtools-dev
>
_______________________________________________
linuxtools-dev mailing list
linuxtools-dev@eclipse.org
https://dev.eclipse.org/mailman/listinfo/linuxtools-dev

Reply via email to