neseleznev opened a new pull request, #1028:
URL: https://github.com/apache/cxf/pull/1028

   Related to https://github.com/apache/cxf/pull/950 and 
https://github.com/apache/cxf/pull/993
   
   ## Problem
   The cxf version `3.5.3` introduced bug which results in invalid MTOM 
requests.
   More precisely, the version highlighted an older bug, I'll elaborate on it :)
   
   ## Investigation
   
   Commit 
https://github.com/apache/cxf/commit/ffba34eed2d5b4af22a93c100e4687e234d53b28#diff-e3efb80d0a98bbbd7f6eddd3c021c5fb5ab05ea2ee8d97dc68026f6345e5a509
 by @reta had changed how `Content-Id` is being dumped to headers.
   First of all, thank you for the bold point of doing this, referring to the 
RFCs. 
   Let's have a look at the line 243 in particular
   
   Provided that `attachmentId` is of format `uuid@domain` it works as 
exepected, however, `attachmentId` is being generated by CXF in routine 
https://github.com/apache/cxf/blob/2ad9d0b2eef17c0d57d3cb96f3b2cecd1e704869/core/src/main/java/org/apache/cxf/attachment/AttachmentUtil.java#L230
 which results in `uuid@urn:xml:namespace` on some inputs.
   This input leads to the Header being URL encoded.
   
   Issues with this header are known for a while 
https://issues.apache.org/jira/browse/CXF-2669 
   What's important is how do the SOAP servers treat URL-encoded `Content-Id`.
   In my experience, IRS.gov does not match 
   ```
   Content-ID: 
<3315f978-0190-4bc2-8a97-f766a78a7946-1@urn%3Aus%3Agov%3Atreasury%3Airs%3Acommon>
   ```
   with previously defined reference
   ```
   <xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include"; 
href="cid:3315f978-0190-4bc2-8a97-f766a78a7946-1@urn%3Aus%3Agov%3Atreasury%3Airs%3Acommon"/>
   ```
   which is basically the same and _should_ match.
   
   That said, it's well-known issue in the wild
   1. https://access.redhat.com/solutions/2062163
   2. https://access.redhat.com/solutions/4076871
   
   The latter points to the fact that there should be no URL-encoded symbols in 
`Content-Id`, which is met by @reta's commit.
   
   ## The Fix
   
   The problem is in `AttachmentUtil::createContentID`, so I've fixed the 
`Content-Id` generation to be more strict and use safe fallback value in cases 
of unmet domain pattern.
   The buggy method uses `new URI(...).getHost()` to extract domain, which is 
not the domain we expect to put in Content ID. 
   Namely, `URI::getHost` javadoc indicates:
   ```
   An IPv6 address enclosed in square brackets ('[' and ']') and consisting of 
hexadecimal digits, colon characters (':'), and possibly an embedded IPv4 
address. The full syntax of IPv6 addresses is specified in RFC 2373: IPv6 
Addressing Architecture.
   ```
   
   Thus, I've also added few tests, which include IPv6 (just in case :) ), and 
looking at those you may ensure the implications of the fix
   
   ## Prolog
   
   I'm new to Apache, and I'm not sure that this is the proper way to post the 
bug report.
   I wasn't able to log in to Apache's JIRA, so I decided to go with fix to 
speed things up.
   You may force-push / rebranch / rewrite / throw my commits away, but I'd be 
happy so long as the fix would be accepted
   
   As of now, I've rolled back cxf version to `3.5.3` and it works as expected. 
Provided that the upgrade has followed Java 17 migration, the same could happen 
to all the projects willing to use Java 17 along with CXF & MTOM functionality


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to