Hi,
 
Ceki's suggestion sounds good.
 
Just a comment re the XML side of things. The bulk of the bloat with XML is often the tag names, attribute names and namespace qualifiers. The beauty of XML is that it doesn't matter what these are; what is important is their relationship to each other. Long meaningful names are most efficient for humans. Short meaningless are most efficient for computers. 
 
For computer-to-computer communication substantial reduction in data stream size can be achieved by using 'tag substitution' (or tag encoding). A tag conversion table is created that maps the human-readable tag and/or attribute names to very short machine-readable names.
 
This does require sending the conversion table one time when establishing the connection but for large data streams the savings can be significant. It is also not much different than sending a DTD so that the sender can validate the XML files to be sent.
 
Note that only the server needs to perform the conversion since the client can simply output the abbreviations directly.
Properly designed it should be possible to send the conversion table as an XSLT file that can then be used to perform the name conversion automatically.
 
The simple example below with only two messages shows a reduction from 327 bytes to 185 bytes or 43%. There is a savings of 142 bytes which is already greater than the 60 or so bytes needed for the conversion table itself.
 
// 120 bytes - standard log output
2001-06-04 13:38:28,664 WARN  [main] XMLSample - Message 1
2001-06-04 13:38:28,664 ERROR [main] XMLSample - Message 2
 
// 327 bytes - xml log output
<log4j:event category="XMLSample" timestamp="991418283544" priority="WARN" thread="main">
<log4j:message><![CDATA[Message 1]]></log4j:message>
</log4j:event>
 
<log4j:event category="XMLSample" timestamp="991418283554" priority="ERROR" thread="main">
<log4j:message><![CDATA[Message 2]]></log4j:message>
</log4j:event>
 
// 239 bytes - xml with abbreviated element names
<e category="XMLSample" timestamp="991418283544" priority="WARN" thread="main">
<m><![CDATA[Message 1]]></m>
</e>
 
<e category="XMLSample" timestamp="991418283554" priority="ERROR" thread="main">
<m><![CDATA[Message 2]]></m>
</e>
 
// 185 bytes - xml with abbreviated element and attribute names
<e c="XMLSample" d="991418283544" p="WARN" t="main">
<m><![CDATA[Message 1]]></m>
</e>
 
<e c="XMLSample" d="991418283554" p="ERROR" t="main">
<m><![CDATA[Message 2]]></m>
</e>
 
// xml name conversion table
c - category
d - timestamp
e - log4j:event
m - log4j:message
p - priority
t - thread
 
Since Log4j uses a very small number of 'names' this approach might be worth looking into.
 
Just a thought.
 
Rick
 

Reply via email to