[ 
https://issues.apache.org/jira/browse/CODEC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088479#comment-18088479
 ] 

Ruiqi Dong commented on CODEC-339:
----------------------------------

If this issue is confirmed, I would like to submit a PR to fix it

> URLCodec.encodeUrl(BitSet, ...) accepts a BitSet that conflicts with 
> decodeUrl(...)
> -----------------------------------------------------------------------------------
>
>                 Key: CODEC-339
>                 URL: https://issues.apache.org/jira/browse/CODEC-339
>             Project: Commons Codec
>          Issue Type: Bug
>            Reporter: Ruiqi Dong
>            Priority: Minor
>
> *Summary*
> URLCodec.encodeUrl(BitSet, byte[]) accepts a caller-provided safe-character 
> set, but it does not document or reject the URL escape character '%' as a 
> forbidden entry. If the supplied BitSet marks '%' as safe, the encoder emits 
> a literal '%'. URLCodec.decodeUrl(byte[]) always interprets '%' as the start 
> of an escape sequence, so the API accepts a configuration that can produce 
> output the paired decoder cannot consume.
>  
> *Affected code*
> File: src/main/java/org/apache/commons/codec/net/URLCodec.java
> {code:java}
> if (b == '+') {
>     buffer.write(' ');
> } else if (b == ESCAPE_CHAR) {
>     ...
> } {code}
> {code:java}
> if (urlsafe.get(b)) {
>     if (b == ' ') {
>         b = '+';
>     }
>     buffer.write(b);
> } else {
>     buffer.write(ESCAPE_CHAR);
>     ...
> }{code}
> The decoder always reserves '%', but the encoder allows a caller-provided 
> BitSet to treat '%' as a literal output byte.
>  
> *Reproducer*
> Add the following test to 
> src/test/java/org/apache/commons/codec/net/URLCodecTest.java:
> {code:java}
> @Test
> void testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe() throws Exception 
> {
>     final BitSet safe = new BitSet();
>     safe.set('%');
>     final byte[] encoded = URLCodec.encodeUrl(safe, 
> "%".getBytes(StandardCharsets.US_ASCII));
>     assertArrayEquals("%25".getBytes(StandardCharsets.US_ASCII), encoded,
>             "The URL escape character must not be emitted literally");
>     assertArrayEquals("%".getBytes(StandardCharsets.US_ASCII), 
> URLCodec.decodeUrl(encoded),
>             "Round-trip should preserve a literal percent sign");
> } {code}
> Run:
> {code:java}
> mvn -q 
> -Dtest=org.apache.commons.codec.net.URLCodecTest#testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe
>  test {code}
> Observed behavior:
> The accepted configuration produces the encoded form %, and decoding that 
> output throws:
> {code:java}
> org.apache.commons.codec.DecoderException: Invalid URL encoding: {code}
> Expected behavior:
> The public API should choose one consistent behavior and make it explicit: * 
> document that '%' must never be marked safe in the caller-provided BitSet
>  * reject such a BitSet up front, for example with IllegalArgumentException
>  * or keep '%' reserved regardless of the `BitSet` and always emit %25
> This is a public API contract gap and a semantic mismatch between the paired 
> encode/decode entry points:
>  * encodeUrl(...) can emit a literal '%'
>  * decodeUrl(...) always treats '%' as control syntax
> The API exposes a configurable safe-character set but does not tell callers 
> that '%' is a reserved value, and it does not reject that configuration 
> either. As a result, a supported public entry point accepts input that can 
> generate undecodable output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to