BaseUtils uses incorrect strategy to distinguish between XML, text and binary
-----------------------------------------------------------------------------

                 Key: SYNAPSE-304
                 URL: https://issues.apache.org/jira/browse/SYNAPSE-304
             Project: Synapse
          Issue Type: Bug
          Components: Transports
            Reporter: Andreas Veithen
            Assignee: Andreas Veithen
             Fix For: 1.3


BaseUtils#setSOAPEnvelope (together with BaseUtils#handleLegacyMessage) is used 
by the  VFS, Mail, JMS and AMQP transports and implements the following 
strategy to distinguish between XML, text and binary payloads: It first tries 
to parse the payload as XML. If that fails, it tries to load it as text using 
BaseUtils#getMessageTextPayload. If that fails again, it loads the message as 
binary data using BaseUtils#getMessageBinaryPayload.

This strategy has the following flaws:
* Corrupted or invalid XML messages are not detected as such but interpreted as 
text or binary data. This will almost certainly lead to errors at a later stage 
in the processing (typically in a mediation that doesn't expect text or binary 
payloads), but for the user it is difficult to identify the root cause of the 
problem.
* The VFSUtils and MailUtils implementation of the getMessageTextPayload method 
actually never fail (except if the file or mime part can't be read). The reason 
is that they read the content as binary and then construct a String object 
using new String(byte[]). This constructor never throws an exception, even if 
there are byte sequences not valid in the platform's default charset. Therefore 
the VFS and mail transport listeners will never process messages as binary 
payloads. This problem can't be solved because there is in fact no (reliable) 
way to distinguish text from binary data by inspecting the content alone. Also 
note that using the platform's default charset to decode the message is also 
incorrect (see SYNAPSE-261).
* This approach doesn't allow using custom message builders to parse messages 
that are neither XML nor plain text or binary.

I think that every transport should first determine the content type of the 
message and than decode the message according to that content type, rather than 
trying different ways to decode the message. The decoding should be delegated 
to the message builder corresponding to the content type. This approach has the 
following advantages:
* Corrupted or invalid messages trigger an appropriate error immediately.
* Since for text payloads the content type can include information about the 
charset (text/plain; charset=...), it provides a straightforward solution for 
issues like SYNAPSE-261.
* Custom message builders can be used.
* It naturally fits into Axis' architecture since it correctly uses the 
concepts of transport and message builder.
* It leads to a more consistent behavior between different transports (in 
particular with the NIO HTTP transport).

The transport should determine the content type either from the service 
configuration (e.g. the transport.vfs.ContentType property for VFS) or from 
information available at the transport protocol level (Content-Type header for 
mail messages, FileContentInfo or file suffix for VFS, message type for JMS, 
etc.).

The algorithm to select the message builder based on the content type and to 
invoke it to create the SOAP infoset is already implemented in the 
TransportUtils.createSOAPMessage utility method in the Axis2 kernel (which is 
also used by the NIO HTTP and the UDP transport). Therefore the proposed 
changes are:
1. Create message builders for text and binary payloads (which are the 
counterparts of PlainTextFormatter and BinaryFormatter introduced by 
SYNAPSE-261).
2. Let ServerManager#start register these new message builders by default in 
the Axis configuration for content types "text/plain" and 
"application/octet-stream" respectively (in a similar way as 
AxisConfigBuilder#populateConfig registers default message builders for 
text/xml, application/soap+xml, etc.). In addition they should be added to the 
default axis2.xml file shipped with Synapse.
3. Make sure that every affected transport implements an appropriate strategy 
to determine the content type and uses TransportUtils.createSOAPMessage instead 
of BaseUtils#setSOAPEnvelope to process the message payload.
4. Remove BaseUtils#setSOAPEnvelope and related code without replacement.

The proposed work plan is as follows:
* For release 1.2, implement 1 and 2 as well as 3 for the VFS transport. This 
allows to completely resolve issue SYNAPSE-261.
* For release 1.3, implement 3 and 4 for all remaining transports (for JMS and 
AMQP after SYNAPSE-303 has been handled).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to