neseleznev opened a new pull request, #1028: URL: https://github.com/apache/cxf/pull/1028
Related to https://github.com/apache/cxf/pull/950 and https://github.com/apache/cxf/pull/993 ## Problem The cxf version `3.5.3` introduced bug which results in invalid MTOM requests. More precisely, the version highlighted an older bug, I'll elaborate on it :) ## Investigation Commit https://github.com/apache/cxf/commit/ffba34eed2d5b4af22a93c100e4687e234d53b28#diff-e3efb80d0a98bbbd7f6eddd3c021c5fb5ab05ea2ee8d97dc68026f6345e5a509 by @reta had changed how `Content-Id` is being dumped to headers. First of all, thank you for the bold point of doing this, referring to the RFCs. Let's have a look at the line 243 in particular Provided that `attachmentId` is of format `uuid@domain` it works as exepected, however, `attachmentId` is being generated by CXF in routine https://github.com/apache/cxf/blob/2ad9d0b2eef17c0d57d3cb96f3b2cecd1e704869/core/src/main/java/org/apache/cxf/attachment/AttachmentUtil.java#L230 which results in `uuid@urn:xml:namespace` on some inputs. This input leads to the Header being URL encoded. Issues with this header are known for a while https://issues.apache.org/jira/browse/CXF-2669 What's important is how do the SOAP servers treat URL-encoded `Content-Id`. In my experience, IRS.gov does not match ``` Content-ID: <3315f978-0190-4bc2-8a97-f766a78a7946-1@urn%3Aus%3Agov%3Atreasury%3Airs%3Acommon> ``` with previously defined reference ``` <xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="cid:3315f978-0190-4bc2-8a97-f766a78a7946-1@urn%3Aus%3Agov%3Atreasury%3Airs%3Acommon"/> ``` which is basically the same and _should_ match. That said, it's well-known issue in the wild 1. https://access.redhat.com/solutions/2062163 2. https://access.redhat.com/solutions/4076871 The latter points to the fact that there should be no URL-encoded symbols in `Content-Id`, which is met by @reta's commit. ## The Fix The problem is in `AttachmentUtil::createContentID`, so I've fixed the `Content-Id` generation to be more strict and use safe fallback value in cases of unmet domain pattern. The buggy method uses `new URI(...).getHost()` to extract domain, which is not the domain we expect to put in Content ID. Namely, `URI::getHost` javadoc indicates: ``` An IPv6 address enclosed in square brackets ('[' and ']') and consisting of hexadecimal digits, colon characters (':'), and possibly an embedded IPv4 address. The full syntax of IPv6 addresses is specified in RFC 2373: IPv6 Addressing Architecture. ``` Thus, I've also added few tests, which include IPv6 (just in case :) ), and looking at those you may ensure the implications of the fix ## Prolog I'm new to Apache, and I'm not sure that this is the proper way to post the bug report. I wasn't able to log in to Apache's JIRA, so I decided to go with fix to speed things up. You may force-push / rebranch / rewrite / throw my commits away, but I'd be happy so long as the fix would be accepted As of now, I've rolled back cxf version to `3.5.3` and it works as expected. Provided that the upgrade has followed Java 17 migration, the same could happen to all the projects willing to use Java 17 along with CXF & MTOM functionality -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
