Author: marrs
Date: Mon May 7 14:34:20 2012
New Revision: 1335036
URL: http://svn.apache.org/viewvc?rev=1335036&view=rev
Log:
Added first conversion of the audit log analysis.
Added:
ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext
Added: ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext
URL:
http://svn.apache.org/viewvc/ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext?rev=1335036&view=auto
==============================================================================
--- ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext (added)
+++ ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext Mon May 7
14:34:20 2012
@@ -0,0 +1,112 @@
+Title: Audit Log Analysis
+
+An audit log is a full historic account of all events that are relevant for a
certain object. In this case, we keep audit logs of each target that is managed
by the provisioning server.
+
+Problem
+=======
+
+The first issue is where to maintain the audit log. On the one hand, one can
maintain it on the target, but since the management agent talks to the server,
it could keep the log too.
+
+Then there is the question of how to maintain the log. What events should be
in it, and what is an event?
+
+Finally, the audit log should be readable and query-able, so people can review
it.
+
+The following use cases can be defined:
+
+* Store event. Stores a new event to the audit log.
+* Get events. Queries (a subset of) events.
+* Merge events. Merges a set of (new) events with the existing events.
+
+Context
+=======
+
+We basically have two contexts:
+
+* Target, limited resources, so we should use something really "lean and mean".
+* Server, scalable solution, expect people to query for (large numbers of)
events.
+
+Possible solutions
+==================
+
+As with all repositories, there should be one location where it is edited. In
this case, the logical place to do that is on the target itself, since that is
where the changes actually occur. In theory, the server also knows, but that
theory breaks down if things fail on the target or other parties start
manipulating the life cycle of bundles. The target itself can detect such
activities.
+
+The next question is what needs to be logged. And how do we get access to
these events?
+
+When storing events, each event can get a unique sequence number. Sequence
numbers start with 1 and can be used to determine if you have the complete log.
+
+Assuming the target has limited storage, it might not be possible to keep the
full log available locally. There are a couple of reasons to replicate this log
to a central server:
+
+* space, as said the full log might not fit;
+* safety, when the target is somehow (partly) erased or compromised, we don't
want to loose the log;
+* remote diagnostics, we want to get an overview of the audit log without
actually connecting to the target directly.
+
+When replicating, the following scenarios can occur:
+
+1. The target has lost its whole log and really wants to (re)start from
sequence number 1.
+2. The server has lost its whole log and receives a partial log.
+
+Starting with the second scenario, the server always simply collects incoming
audit logs, so its memory can be restored from any number of targets or relay
servers that report everything they know (again). Hopefully that will lead to a
complete log again. If not, there's not much we can do.
+
+The first scenario is potentially more problematic, since the target has no
way of knowing (for sure) at which sequence number it had arrived when
everything was lost. In theory it might ask (relay) servers, but even those
might not have been up to date, so that does not work. The only thing it can do
here is: Start a new log at sequence number 1. That means we can have more than
one log in these cases, and that again means we need to be able to identify
which log (of each target) we're talking about. Therefore, when a new log is
created, it should contain some unique identifier for that log (an identifier
that should not depend on stored information, so for example we could use the
current time in milliseconds, that should be fairly unique, or just some random
number).
+
+How to find the central server? Use the discovery service!? This is not that
big of a deal.
+
+Events should at least contain:
+
+* a datestamp, indicating when the event occurred;
+* a checksum and/or signature;
+* a short, human readable message explaining the event;
+* details:
+ * in the form of a (possibly multi-line) document
+ * in the form of a set of properties
+
+The server will add:
+
+* the target ID of the target that logged the event.
+
+Storage will be resolve differently on the server and target. On the target,
using any kind of database would amount to having to include a considerable
library, which makes these solutions impractical there. We might want to
consider something like that for the server though. The options we have, are:
+
+* Relational database
+* Object database
+* XML
+* DIY
+
+How do events get logged?
+
+* explicitly, our management agent calls an AuditLog service method;
+* implicitly, by logging (certain) events in the system;
+
+Implicit algorithms can be build on top of the AuditLog service. What we need
to monitor is the life cycle layer, which basically means adding a
BundleListener and an FrameworkListener. Those capture all state changes of the
framework. Technically we can either directly add those listeners, or use
EventAdmin if that is available.
+
+What would be the best way for the target to send audit log updates to the
server? I don't think we want the server to poll here, so the target should
send updates (periodically). So how does it know what to send?
+
+* it could keep track of the last event it sent, sending newer ones after that;
+* it could ask for the list of events the server has;
+* it could send its highest log event number, and get back a list of missing
events on the server, and then respond with the missing events.
+* it could just send everything.
+
+Discussion
+==========
+
+Having two layers for the audit log makes sense:
+
+* The first, lowest, layer is the AuditLog service that gives access to the
log. On the one hand it allows people to log messages, on the other it should
provide query access. Those should be split into two different interfaces.
+* The second layer can build on top of that. It can either be removed
completely, which means the responsibility for logging becomes that of the
application (probably the management agent). It can be implemented using
listeners. Finally, it can be implemented using events.
+
+On the target we should implement a storage solution ourselves, to keep the
actual code small. The code should be able to log events quickly (as that will
happen far more often than retrieving them).
+
+Communication between the target and server should be initiated by the target.
The target can basically send two commands to the server:
+
+1. My audit log contains sequence number 4-8, tell me your numbers. The server
then responds (for example) with 1-6. This indicates we need to send 7-8.
+2. Here you have events 7-8, can you send me 1-3? The server stores its
missing events, and sends you the events it has (always check if what you get
is what you requested).
+
+This is setup in this way so the same commands can also be used by relay
servers to replicate logs between server and target.
+
+Conclusion
+==========
+
+* The audit log is maintained on the target.
+* On the target, we implement the storage mechanism ourselves to ensure we
have a solution with a very small footprint.
+* On the server, we use an XStream based solution to store the logs of all the
targets.
+* Our communication protocol between target and (relay)server however, should
probably not rely on XML.
+* Our communication protocol between server and (relay)server might rely on
XML (determine at design time what makes most sense).