Resolving SYNAPSE-218

Andreas Veithen Sat, 08 Mar 2008 17:02:00 -0800

Hi all!

In order to resolve SYNAPSE-218 (TextFileDataSource violatesOMDataSource contract), I think we need to review the following pieceof code in VFSTransportSender#populateResponseFile (there is similarcode in MailTransportSender#sendMail):


if (firstChild instanceof OMSourcedElementImpl) {
    firstChild.serializeAndConsume(os);
} else {
    os.write(firstChild.getText().getBytes());
}

The purpose of this piece of code is to write the content of {http://ws.apache.org/commons/ns/payload}text elements to the response file. OMSourcedElementImpl nodes arehandled differently to ensure that large output from XSLtransformations is processed efficiently (i.e. without loading theentire temporary output file into memory). The code has two problems:

1) Since OMSourcedElementImpl extends OMElement, a call toserializeAndConsume should normally write out the entire element, i.e.start tag, content (encoded as XML) and end tag. This is of course notwhat is indented here. The code only works as expected becauseTextFileDataSource doesn't respect the OMDataSource contract (which iswhat SYNAPSE-218 is all about).

2) The instruction os.write(firstChild.getText().getBytes()) willencode the content of the element using the default platform encoding,which is not always what is expected. Note that for the output of anXSL transformation, the content of the element is produced by thefollowing instruction in XSLTMediator#performXSLT:


handleNonXMLResult(baosForTarget.toString(), traceOrDebugOn, traceOn)

While considered separately this is also incorrect (sinceByteArrayOutputStream#toString uses the default platform encoding,while the encoding of the stream depends on the stylesheet), in mostcases the net result is indeed that the response file will have theencoding specified in the stylesheet. However this only works if thecombined transformation ByteArrayOutputStream#toString ->String#getBytes is equivalent to the identity transformation. This isnot the case if the default platform encoding is e.g. UTF-8.

The solution for the first problem is actually surprisingly simple.The correct behavior can be achieved by replacing the code by thefollowing instructions:


OMNode node = firstChild.getFirstOMChild();
while (node != null) {
    if (node instanceof OMText) {
        os.write(((OMText)node).getText().getBytes());
    }
    node = node.getNextOMSibling();
}

I checked that for an OMSourcesElementImpl node backed by aTextFileDataSource object, getFirstOMChild and getNextOMSibling willread a single chunk of text from the WrappedTextNodeStreamReaderconstructed by TextFileDataSource. Therefore the replacement code willhandle large temporary files with the same efficiency as the originalcode.

An obviously solution for the second problem is to allow theconfiguration of the output encoding in the VFS transport (and tocorrect XSLTMediator!). However there might be cases where the userwants to specify the output encoding in the XSLT stylesheet. Thiscould be achieved by allowing XSLTMediator to be configured to use abinary wrapper instead of a text wrapper for text output. In this caseSynapse would strictly preserve the output of the XSL transformation.


I'm waiting for your comments!

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Resolving SYNAPSE-218

Reply via email to