SourceTransformer cant transform to DOM with non US ASCII characters like 'ä' 
or 'ü'
------------------------------------------------------------------------------------

         Key: SM-414
         URL: https://issues.apache.org/activemq/browse/SM-414
     Project: ServiceMix
        Type: Bug

  Components: servicemix-core  
    Versions: 3.0-M1, 3.0-M2, 3.0, incubation    
 Environment: W2K, J2SE 1.4.2, Xerces 2.7.1, default locale of OS with 
character set 'windows-1252'
    Reporter: Juergen Mayrbaeurl
    Priority: Blocker
     Fix For: 3.0, incubation
 Attachments: SourceTransformer-sources.zip

The class org.apache.servicemix.jbi.jaxp.SourceTransformer, which belongs to 
the core classes of ServiceMix and is used very often, has major problems 
transforming Source to DOM data structures, when the source contains non 
US-ASCII charactes like 'ä' or 'ü'. 

The class uses DocumentBuilders (see method 'public DOMSource 
toDOMSourceFromStream(StreamSource source) throws ParserConfigurationException, 
IOException, SAXException') for the transformation and uses the method 'public 
Document parse(InputStream is, String systemId) throws SAXException, 
IOException' without explicitly telling the DocumentBuilder the character 
encoding it should use. This results in fatal errors (exceptions) returned by 
the DocumentBuilder (Xerces 2.7.1), because it encounters invalid character 
code sequences (especially with UTF-8 and multi-byte characters like 'ä' or 
'ö'). This means that you can't use non US-ASCII characters in messages, as 
soon as ServiceMix uses an instance of the class SourceTransformer to do any 
transformation to DOM. This is the case when tracing messages in the 
DeliveryChannel or evaluating an XPath expression for e.g. Content based 
routing. 

The solution to this problem is straight forward: Tell the DocumentBuilder the 
character encoding it has to use. Looks like:

    public DOMSource toDOMSourceFromStream(StreamSource source) throws 
ParserConfigurationException, IOException,
            SAXException {
        DocumentBuilder builder = createDocumentBuilder();
        String systemId = source.getSystemId();
        Document document = null;
        InputStream inputStream = source.getInputStream();
        if (inputStream != null) {
            InputSource inputsource = new InputSource(inputStream);
            inputsource.setSystemId(systemId);
            inputsource.setEncoding(defaultCharEncodingName);  // <-- Very 
important
            
            document = builder.parse(inputsource);
        }
        else {
            Reader reader = source.getReader();
            if (reader != null) {
                document = builder.parse(new InputSource(reader));
            }
            else {
                throw new IOException("No input stream or reader available");
            }
        }
        return new DOMSource(document, systemId);
    }

I've attached the original source file of SourceTransformer (3.0 SNAPSHOT, 
2006-04-20) and the changed (Unfortunately I can't create a real patch).

Kind regards
Juergen

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   https://issues.apache.org/activemq/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to