Re: Writing log in xml format

Curt Arnold Wed, 27 Apr 2005 13:14:16 -0700


On Apr 27, 2005, at 1:22 PM, Dave Pawson wrote:

On Wed, 2005-04-27 at 13:03 -0500, Curt Arnold wrote:
I'm not fond of the CDATA sections either. Since the XMLLayout is not aware of the encoding of the writer, it does not know when to create character entities.
Not the applications concern is my response.
  The application should be responsible for escaping those characters
  that XML has declared that it doesn't like (&lt; and &amp;....
OK and [[ => [&#x5A;
Anything else the author should transpose into numerical character
entities.

The first two sentences appear contradict each other. The body of code calling logger.info() et al should not escape its messages so they are safe for a naive XML serialization. For example, the message may be routed to both a ConsoleAppender with a non-XML layout where a literal '<' would be appropriate and FileAppender with an XML layout where < would be appropriate.

My preference would be to use character entities for markup characters or any non-USASCII character which would eliminate the need for the CDATA sections and would allow the log to still be readable in case an editor assumes ISO-8859-1 and the document is UTF-8 or vice-versa.
I'd say (contra to some 'laws' :-) that log4j make no assumptions about
downstream usage.
produce (and declare) utf-8 encoding. No more. it's impossible to
predict what people will do with a file?


There are two distinct issues here that I jumbled together.

If the current XMLLayout is used and the encoding of the appender is not set to UTF-8 or one of the UTF-16's, the resulting file may be non-wellformed or lose information. For example, the default encoding on Windows platforms is the current Windows code page (for example, Cp1252 for Western European langauges). Unless the user explicitly overrides the default encoding on a file appender, then generated XML file will become corrupt if any non-ASCII character is output. If the character is > \u00FF, the character will be replaced with a substitution character '?'. If the character is > \u0080 and \u00FF inclusive, the code will be output but will likely result in an illegal UTF-8 sequence which XML parsers are required to treat as an unrecoverable error. log4j does not attempt to prevent or detect a mismatch between the encoding required by the XMLLayout and the encoding in use by the Writer. If the XMLLayout represented all characters >= \u0080 as character entities, the catastrophic effects of a mismatch would be reduced. My feeling is that most configurations that use XMLLayout are vulnerable to this problem but are either running on platforms where the default encoding is UTF-8 or have not encountered messages containing characters between \u0080 and \u00FF. Since the character and the corresponding character entity are required to be treated identically by an XML processor, using a character entity seems to have no downside other than a some increase in file size and maybe some impact on performance.

The second issue is that users may want to attempt to view an XML log with a non-XML aware tool (Notepad, tail, etc) which could assume the current platform encoding instead UTF-8. If non-USASCII characters were expressed as character entities, these tools could still be used without potentially misinterpreting or corrupting the data.

 There is a decent likelihood for XMLLayout to
change before final release of log4j 1.3, so it may be better to avoid
subclassing it since it really doesn't appear to be designed for
extension and not that hard to duplicate.  I think it should have been
marked final, but then I think that about a lot of things.


Ah! Hope for me yet!


Is there a way I can .... (I'm looking for help here)
  Ask log4j to keep all the stuff I can set via the text properties
file, and then make it available via the api for the xml formatter?


What is "all the stuff I can set via the text properties"?  Anything
you can place in the MDC, NDC or properties should be present in the
generated XML.

That's a comment I have on 'the book'.
Both those terms are used, without explanation.
Am I supposed to know what they mean?
I don't.

NDC: Nested diagnostic context, see http://logging.apache.org/log4j/docs/manual.html MDC: Mapped diagnostic context, http://logging.apache.org/log4j/docs/api/org/apache/log4j/MDC.html

The appender properties are something new in log4j 1.3 that I really haven't played with.

A log file can be appended to by several invocations of an application each of which appends to the running log. If each invocation wrote <log4j:log> and </log4j:log> at its start up and termination, you would still have the problem of multiple document elements.
A possible solution is to use a o.a.l.rolling.RollingFileAppender with
a RollingPolicy that instead of just renaming the file on a roll, it
would add the outer document element.
Hence it was a request :-)
Someone knows when this logger starts.... (the timing thingy perhaps)
That's the class I'm talking about.

Just in the last few hours there was a bug request filed to add a header/footer to the rolling file appender which would be very similar to that request.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Writing log in xml format

Reply via email to