[jira] [Commented] (CXF-7491) TransformInInterceptor / TransformOutInterceptor assume UTF-8

JIRA Thu, 31 Aug 2017 03:57:38 -0700

    [ 
https://issues.apache.org/jira/browse/CXF-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148827#comment-16148827
 ]


Cyrille Chépélov commented on CXF-7491:
---------------------------------------

A way to correct this is to change (looking a the stack top down):
* in TransformUtils.java line 41—45, deprecate createNewReaderIfNeeded / 
createNewWriterIfNeeded ; add an overload with an "encoding" argument ; have it 
call the overload of StaxUtils.createXMLStreamReader (resp. 
StaxUtils.createXMLStreamWriter) with the encoding argument
* propagate the deprecation + additional 'encoding' argument overload in 
TransformUtils.java line 49-110 for methods 
createTransform(Writer|Reader)IfNeeded 
* add an argument to the protected method "createTransformReaderIfNeeded" in 
TransformInInterceptor.java, and in the handleMessage method, extract the 
desired encoding from the Message structure (from the 
org.apache.cxf.message.Message.ENCODING property), defaulting to UTF-8 if the 
property is missing.
* symmetric changes in TransformOutInterceptor.java

Question: in the TransformInInterceptor class, the 
createTransformReaderIfNeeded method lacks a way to convey the desired 
encoding, and is protected, making it an extension point. Simply deprecating 
the method + adding a charset-aware overload would be dangerous for subclasses 
of TransformInInterceptor, as these subclasses would suddenly no longer be 
using the subclassed behaviour, but would revert to TransformInInterceptor's 
operation.

Based on the principle of minimum surprise, I see two ways:
# intentionally break compatibility by adding the "encoding" argument to 
createTransformReaderIfNeeded in TransformInInterceptor (+symmetric in 
TransformOutInterceptor), forcing any subclass to be updated before being 
usable again. This also breaks binary compatibility for the sake of fixing a 
fairly local issue
# avoid breaking compatibility by finding a secondary channel aside from method 
parameters to convey the desired encoding from the handleMessage to the 
createTransformReader method, which would automatically be ugly but would 
preserve binary compatibility (while it wouldn't necessarily 'magically' update 
the behaviour of TransformInInterceptor subclasses to follow the desired 
encoding, it would avoid surprises). 
# deprecate TransformInInterceptor / TransformOutInterceptor (keeping the 
interface as is), implement the changes proposed in point 1 into copies 
(TransformInCharsetAwareInterceptor / TransformOutCharsetAwareInterceptor), 
update StaxTransformFeature to use the new interceptor implementations.

While solution #1 is simpler, #3 seems to avoid surprise behaviour changes and 
breaking binary compatibility while avoiding ugly and performance-consuming 
hacks. 

Proceeding to implement #3 unless advised otherwise.

(I, of course, have no way to ask and get the remote IBMi system to speak UTF-8 
in any sort of reasonable time frame)


> TransformInInterceptor / TransformOutInterceptor assume UTF-8
> -------------------------------------------------------------
>
>                 Key: CXF-7491
>                 URL: https://issues.apache.org/jira/browse/CXF-7491
>             Project: CXF
>          Issue Type: Bug
>          Components: Soap Binding
>    Affects Versions: 3.1.11, 3.1.12
>         Environment: client Linux/Java/CXF 
> server IBMi AS/400
>            Reporter: Cyrille Chépélov
>
> When talking to a server using IBMi / RPG-based software and SOAP gateway:
> the returned SOAP message contains XML encoded as ISO-8859-1; the HTTP header 
> do specify a content type of xml+soap with character set ISO-8859-1; however 
> the XML message itself include no character set declaration.
> Due to discrepancies between the official WSDL for the SOAP message and the 
> remote implementation, a couple transforms had to be deployed. This works 
> fine as long as the exchanged messages actually conform to US-ASCII (no 
> diacritics), but whenever any character encoded differently between 
> ISO-8859-1 and UTF-8 is used, the TransformInInterceptor fails to parse the 
> text, as the XMLStreamReader is built to expect UTF-8 and actually receives 
> ISO-8859-1 input



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CXF-7491) TransformInInterceptor / TransformOutInterceptor assume UTF-8

Reply via email to