DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43736>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43736

           Summary: Chainsaw does not honor encoding when loading XML files
           Product: Log4j
           Version: 1.2
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: chainsaw
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


On Oct 30, 2007, at 2:16 PM on log4j-user, Jessica Lin wrote:

I want to use Chainsaw to view the log file contains Chinese character. The log 
file was recorded by 
using FileAppender which I defined the endoding as “UTF-8”. Here is part of my 
log4j.properties file.


# xml format file appender
log4j.appender.xml=org.apache.log4j.FileAppender
log4j.appender.xml.file=xml.log
log4j.appender.xml.encoding=UTF-8
log4j.appender.xml.append=false
log4j.appender.xml.layout=org.apache.log4j.xml.XMLLayout

Then  I use Chainsaw to load “xml.log” file. The Chinese characters are shown 
as “ åŠ è¿™ä¸ªåŠŸèƒ½”. 
The Original characters are “?????”. 

I double checked the “xml.log” which did save as UTF-8 encoding. The XMLDecoder 
file Which 
Chainsaw uses to load XML file also use UTF-8 encoding.

Can you help me?

Thanks,

Jessica


---------

The problem appears to be in o.a.l.xml.XMLDecoder in the receivers companion 
where at line 186 and 
188, InputStreamReaders are allocated without explicitly specifying an 
encoding.  That will cause the 
InputStreamReader to use the default platform encoding which appears not be to 
UTF-8 in this 
instance.

The approach is broken and needs to be rewritten to handle any arbitrary 
encoding.  The XML parser 
should be presented with a minimal document like:

<!DOCTYPE log4j:eventSet [
<!ENTITY content SYSTEM "...">
]>
<log4j:eventSet version="1.2" xmlns:log4j="...">
    &content;
</log4:eventSet>

and an entity resolver should then load the URL as a byte stream in response to 
the resolveEntity call. 

For a work around, anything that sets the default charset for the JVM to UTF-8 
should avoid the 
problem until it can be fixed.  There is not a clearly documented way to do 
that and it is platform 
dependent.  On a Nix machine, you could try

export LC_CTYPE=UTF-8

on Windows you could try:

java -Dfile.encoding=UTF-8 org.apache.log4j.chainsaw...

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to