kevdoran opened a new pull request #302: URL: https://github.com/apache/nifi-registry/pull/302
- Adds detection and encoding of non-ascii characters to creation of chain - Adds unit tests and integration tests that use proxied entities with Unicode - Refactors ProxiedEntityRequestConfig to compute header value in constructor ## Overview of these changes. This changes the creation the `X-ProxiedEntitiesChain` header values to: 1. Detect non-ascii characters in an input identity/DN 2. If non-ascii characters are present, base64 encode the value and wrap it in additional `<angle braces>` For example, a proxy chain of ```Алйс, CN=nifi.apache.org``` Will result in: ```X-ProxiedEntitiesChain: <<0JDQu9C50YE=>><CN=nifi.apache.org>``` Additionally, logic for tokenizing these header values was updated to: 1. Detect if a token is encoded (by the presence of additional `<angle braces>`) 2. If encoded, decode it as part of the tokenization process. These changes should be completely backwards compatible, as entities containing non-ascii characters are encoded exactly the same as they previously were. (Previously, entities containing non-ascii characters would not survive a round trip hop between nodes, as per [NIFI-7744](https://issues.apache.org/jira/browse/NIFI-7744), so I am not worried about that case.) A number of implementation approaches were considered, including ones based on RFC-2231 and RFC-8187. Ultimately, approaches that relied on percent encoding were rejected as percent-encoding outside of URL encoding are not widespread in standard libraries and difficult to implement correctly given complexity of unicode and things such as surragote characters. Given that in theory, third-party components such as proxies may need to implement this logic, base64 was chosen as is it widely available in all languages and frameworks (particularly in reverse proxies, where base64 is used as part of basic auth header formation). The use of double `<<angle braces>>` to indicate encoding was chosen as it allows for an easy way for a decoder to determine if the value is encoded (necessary because encoding is optional for purely ascii entities), and because the reserved characters `<` and `>` are already protected/escaped in our sanitization process. I considered but decided against always base64 encoding the header value, because (1) this maintains backwards compatibility between nifi and registry on different versions and (2) plaintext, non-encoded values, which are still the majority of use-cases, are easier for users to troubleshoot. If this approach is satisfactory, let me know and I will make corresponding changes in NiFi. As it stands, I will have to re-implement most of this in the NiFi. Another option would be extracting most of this logic into a more sharable module in NiFi registry (or move to `nifi-registry-utils` perhaps), so that it could be imported and used by NiFi. I decided against this for now, as ultimately this is a good candidate for `nifi-standard-libs` and refactoring short-term would be more messy/risky (the current implementation in `nifi-registry-security-utils` requires apache commons which `nifi-registry-utils` does not have). --- Thank you for submitting a contribution to Apache NiFi Registry. Please provide a short description of the PR here: #### Description of PR _Enables X functionality; fixes bug NIFIREG-YYYY._ In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with **NIFIREG-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically `main`)? - [ ] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._ ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi-registry` folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] Have you verified that the full build is successful on JDK 8? - [ ] Have you verified that the full build is successful on JDK 11? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-registry-assembly`? - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-registry-assembly`? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
