On Thu, 2005-04-28 at 15:02 -0500, Curt Arnold wrote:
> On Apr 28, 2005, at 1:40 PM, Dave Pawson wrote:
>
> >
> > I think that means the 'end user' shouldn't have to escape his content?
> > If so I agree.
> > logger.info("One < two"); etc.
>
> We are thinking alike then. Your first message had said "Not the
> applications concern" and " The application should be responsible", I'm
> guessing that you dropped a NOT from the second sentence.
<grin/> Definition of the application?
I meant the users application, they shouldn't have to worry about it.
The application - log4j - should look after it.
> >
>
> The encoding is not attribute of the layout, but WriterAppender which
> FileAppender and others extend.
Sorry, I didn't know that.
> The current XMLLayout is unsafe when
> attached to an WriterAppender that is not "UTF-8" or one of the
> "UTF-16"'s, however there is no way for the layout to determine or
> change the encoding the writer. The default encoding for the existing
> WriterAppender derived classes should not be changed since existing use
> with other appenders (PatternLayout) expect that the output encoding to
> be in the platform default encoding.
That sounds bad.
Tied to the platform default encoding could mean anything.
How could they be separated?
> The scenarios that I described can be avoided if the person building
> the configuration is aware that they need to specify a Unicode encoding
> on the appender when they use an XMLLayout.
I think that should be mandated.
> We could make the JavaDoc
> for XMLLayout much more emphatic that that needs to be done, however
> that you would get nowhere close to 100% adherence to that
> recommendation.
No, but if its clear, the responsibility is clearly on the user of
log4j. I'm hoping that people will be slowly coming round to an
understanding of i18N as XML usage spreads.
> Since the problem can almost entirely be avoided by
> use of character entities for non-USASCII characters and the cost would
> be negligible unless you you were primarily logging in non-European
> languages in which case the log files would be larger than necessary.
> If that is really a concern the use of entities could be configured.
If you mean numerical character entities (ሴ form) then I agree.
The base requirement should be for utf-8 encoding.
> XMLLayout only works in terms of Java Strings and chars which are
> defined as UTF-16 code points. The problems come when the string is
> converted to a byte stream or vice-versa.
I've used the term serialization for that.
> The UI will take care of
> converting the keystrokes into characters from log4j perspective it
> doesn't matter whether a character was generated by someone pressing
> the "A" key or an 'A' or a '\u0041' in the source file. In the same
> way an XML processor is required to treat <foo>A</foo> and
> <foo>A</foo> identically.
Yes. But not all users appreciate that task of the parser in those
conversions when read from a file :)
>
>
>
> >
> >> My feeling is that most configurations
> >> that use XMLLayout are vulnerable to this problem but are either
> >> running on platforms where the default encoding is UTF-8 or have not
> >> encountered messages containing characters between \u0080 and \u00FF.
> > I.e. its not really predictable? Hence make it clear, hence
> > predictable and hence a solid chain can be designed, input to output,
> > based on utf-8|16
>
> Can't do that. Existing users of the other layouts depend on the
> encoding being in the platform default encoding.
Hence the question of how to isolate the XMLLayout from those others.
>
> As said previously, the XMLLayout cannot easily affect the encoding
> used by the writer that it is attached to.
Hence they should work as a pair, without conflicting
requirements placed on them.
regards DaveP
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]