[
https://issues.apache.org/jira/browse/CODEC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088479#comment-18088479
]
Ruiqi Dong commented on CODEC-339:
----------------------------------
If this issue is confirmed, I would like to submit a PR to fix it
> URLCodec.encodeUrl(BitSet, ...) accepts a BitSet that conflicts with
> decodeUrl(...)
> -----------------------------------------------------------------------------------
>
> Key: CODEC-339
> URL: https://issues.apache.org/jira/browse/CODEC-339
> Project: Commons Codec
> Issue Type: Bug
> Reporter: Ruiqi Dong
> Priority: Minor
>
> *Summary*
> URLCodec.encodeUrl(BitSet, byte[]) accepts a caller-provided safe-character
> set, but it does not document or reject the URL escape character '%' as a
> forbidden entry. If the supplied BitSet marks '%' as safe, the encoder emits
> a literal '%'. URLCodec.decodeUrl(byte[]) always interprets '%' as the start
> of an escape sequence, so the API accepts a configuration that can produce
> output the paired decoder cannot consume.
>
> *Affected code*
> File: src/main/java/org/apache/commons/codec/net/URLCodec.java
> {code:java}
> if (b == '+') {
> buffer.write(' ');
> } else if (b == ESCAPE_CHAR) {
> ...
> } {code}
> {code:java}
> if (urlsafe.get(b)) {
> if (b == ' ') {
> b = '+';
> }
> buffer.write(b);
> } else {
> buffer.write(ESCAPE_CHAR);
> ...
> }{code}
> The decoder always reserves '%', but the encoder allows a caller-provided
> BitSet to treat '%' as a literal output byte.
>
> *Reproducer*
> Add the following test to
> src/test/java/org/apache/commons/codec/net/URLCodecTest.java:
> {code:java}
> @Test
> void testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe() throws Exception
> {
> final BitSet safe = new BitSet();
> safe.set('%');
> final byte[] encoded = URLCodec.encodeUrl(safe,
> "%".getBytes(StandardCharsets.US_ASCII));
> assertArrayEquals("%25".getBytes(StandardCharsets.US_ASCII), encoded,
> "The URL escape character must not be emitted literally");
> assertArrayEquals("%".getBytes(StandardCharsets.US_ASCII),
> URLCodec.decodeUrl(encoded),
> "Round-trip should preserve a literal percent sign");
> } {code}
> Run:
> {code:java}
> mvn -q
> -Dtest=org.apache.commons.codec.net.URLCodecTest#testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe
> test {code}
> Observed behavior:
> The accepted configuration produces the encoded form %, and decoding that
> output throws:
> {code:java}
> org.apache.commons.codec.DecoderException: Invalid URL encoding: {code}
> Expected behavior:
> The public API should choose one consistent behavior and make it explicit: *
> document that '%' must never be marked safe in the caller-provided BitSet
> * reject such a BitSet up front, for example with IllegalArgumentException
> * or keep '%' reserved regardless of the `BitSet` and always emit %25
> This is a public API contract gap and a semantic mismatch between the paired
> encode/decode entry points:
> * encodeUrl(...) can emit a literal '%'
> * decodeUrl(...) always treats '%' as control syntax
> The API exposes a configurable safe-character set but does not tell callers
> that '%' is a reserved value, and it does not reject that configuration
> either. As a result, a supported public entry point accepts input that can
> generate undecodable output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)