[
https://issues.apache.org/jira/browse/CODEC-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruiqi Dong updated CODEC-339:
-----------------------------
Description:
*Summary*
URLCodec.encodeUrl(BitSet, byte[]) accepts a caller-provided safe-character
set, but it does not document or reject the URL escape character '%' as a
forbidden entry. If the supplied BitSet marks '%' as safe, the encoder emits a
literal '%'. URLCodec.decodeUrl(byte[]) always interprets '%' as the start of
an escape sequence, so the API accepts a configuration that can produce output
the paired decoder cannot consume.
*Affected code*
File: src/main/java/org/apache/commons/codec/net/URLCodec.java
{code:java}
if (b == '+') {
buffer.write(' ');
} else if (b == ESCAPE_CHAR) {
...
} {code}
{code:java}
if (urlsafe.get(b)) {
if (b == ' ') {
b = '+';
}
buffer.write(b);
} else {
buffer.write(ESCAPE_CHAR);
...
}{code}
The decoder always reserves '%', but the encoder allows a caller-provided
BitSet to treat '%' as a literal output byte.
*Reproducer*
Add the following test to
src/test/java/org/apache/commons/codec/net/URLCodecTest.java:
{code:java}
@Test
void testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe() throws Exception {
final BitSet safe = new BitSet();
safe.set('%');
final byte[] encoded = URLCodec.encodeUrl(safe,
"%".getBytes(StandardCharsets.US_ASCII));
assertArrayEquals("%25".getBytes(StandardCharsets.US_ASCII), encoded,
"The URL escape character must not be emitted literally");
assertArrayEquals("%".getBytes(StandardCharsets.US_ASCII),
URLCodec.decodeUrl(encoded),
"Round-trip should preserve a literal percent sign");
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.net.URLCodecTest#testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe
test {code}
Observed behavior:
The accepted configuration produces the encoded form %, and decoding that
output throws:
{code:java}
org.apache.commons.codec.DecoderException: Invalid URL encoding: {code}
Expected behavior:
The public API should choose one consistent behavior and make it explicit: *
document that '%' must never be marked safe in the caller-provided BitSet
* reject such a BitSet up front, for example with IllegalArgumentException
* or keep '%' reserved regardless of the `BitSet` and always emit %25
This is a public API contract gap and a semantic mismatch between the paired
encode/decode entry points:
* encodeUrl(...) can emit a literal '%'
* decodeUrl(...) always treats '%' as control syntax
The API exposes a configurable safe-character set but does not tell callers
that '%' is a reserved value, and it does not reject that configuration either.
As a result, a supported public entry point accepts input that can generate
undecodable output.
was:
*Summary*
URLCodec.encodeUrl(BitSet, byte[]) accepts a caller-provided safe-character
set, but it does not document or reject the URL escape character '%' as a
forbidden entry. If the supplied BitSet marks '%' as safe, the encoder emits a
literal '%'. URLCodec.decodeUrl(byte[]) always interprets '%' as the start of
an escape sequence, so the API accepts a configuration that can produce output
the paired decoder cannot consume.
*Affected code*
File: src/main/java/org/apache/commons/codec/net/URLCodec.java
{code:java}
if (b == '+') {
buffer.write(' ');
} else if (b == ESCAPE_CHAR) {
...
} {code}
{code:java}
if (urlsafe.get(b)) {
if (b == ' ') {
b = '+';
}
buffer.write(b);
} else {
buffer.write(ESCAPE_CHAR);
...
}{code}
The decoder always reserves '%', but the encoder allows a caller-provided
BitSet to treat '%' as a literal output byte.
*Reproducer*
Add the following test to
src/test/java/org/apache/commons/codec/net/URLCodecTest.java:
{code:java}
@Test
void testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe() throws Exception {
final BitSet safe = new BitSet();
safe.set('%');
final byte[] encoded = URLCodec.encodeUrl(safe,
"%".getBytes(StandardCharsets.US_ASCII));
assertArrayEquals("%25".getBytes(StandardCharsets.US_ASCII), encoded,
"The URL escape character must not be emitted literally");
assertArrayEquals("%".getBytes(StandardCharsets.US_ASCII),
URLCodec.decodeUrl(encoded),
"Round-trip should preserve a literal percent sign");
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.net.URLCodecTest#testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe
test {code}
Observed behavior:
The accepted configuration produces the encoded form %, and decoding that
output throws:
{code:java}
org.apache.commons.codec.DecoderException: Invalid URL encoding: {code}
Expected behavior:
The public API should choose one consistent behavior and make it explicit: *
document that '%' must never be marked safe in the caller-provided BitSet
* reject such a BitSet up front, for example with IllegalArgumentException
* or keep '%' reserved regardless of the `BitSet` and always emit %25
This is a public API contract gap and a semantic mismatch between the paired
encode/decode entry points: * encodeUrl(...) can emit a literal '%'
* decodeUrl(...) always treats '%' as control syntax
The API exposes a configurable safe-character set but does not tell callers
that '%' is a reserved value, and it does not reject that configuration either.
As a result, a supported public entry point accepts input that can generate
undecodable output.
> URLCodec.encodeUrl(BitSet, ...) accepts a BitSet that conflicts with
> decodeUrl(...)
> -----------------------------------------------------------------------------------
>
> Key: CODEC-339
> URL: https://issues.apache.org/jira/browse/CODEC-339
> Project: Commons Codec
> Issue Type: Bug
> Reporter: Ruiqi Dong
> Priority: Minor
>
> *Summary*
> URLCodec.encodeUrl(BitSet, byte[]) accepts a caller-provided safe-character
> set, but it does not document or reject the URL escape character '%' as a
> forbidden entry. If the supplied BitSet marks '%' as safe, the encoder emits
> a literal '%'. URLCodec.decodeUrl(byte[]) always interprets '%' as the start
> of an escape sequence, so the API accepts a configuration that can produce
> output the paired decoder cannot consume.
>
> *Affected code*
> File: src/main/java/org/apache/commons/codec/net/URLCodec.java
> {code:java}
> if (b == '+') {
> buffer.write(' ');
> } else if (b == ESCAPE_CHAR) {
> ...
> } {code}
> {code:java}
> if (urlsafe.get(b)) {
> if (b == ' ') {
> b = '+';
> }
> buffer.write(b);
> } else {
> buffer.write(ESCAPE_CHAR);
> ...
> }{code}
> The decoder always reserves '%', but the encoder allows a caller-provided
> BitSet to treat '%' as a literal output byte.
>
> *Reproducer*
> Add the following test to
> src/test/java/org/apache/commons/codec/net/URLCodecTest.java:
> {code:java}
> @Test
> void testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe() throws Exception
> {
> final BitSet safe = new BitSet();
> safe.set('%');
> final byte[] encoded = URLCodec.encodeUrl(safe,
> "%".getBytes(StandardCharsets.US_ASCII));
> assertArrayEquals("%25".getBytes(StandardCharsets.US_ASCII), encoded,
> "The URL escape character must not be emitted literally");
> assertArrayEquals("%".getBytes(StandardCharsets.US_ASCII),
> URLCodec.decodeUrl(encoded),
> "Round-trip should preserve a literal percent sign");
> } {code}
> Run:
> {code:java}
> mvn -q
> -Dtest=org.apache.commons.codec.net.URLCodecTest#testEncodeUrlMustEscapeLiteralPercentEvenIfMarkedSafe
> test {code}
> Observed behavior:
> The accepted configuration produces the encoded form %, and decoding that
> output throws:
> {code:java}
> org.apache.commons.codec.DecoderException: Invalid URL encoding: {code}
> Expected behavior:
> The public API should choose one consistent behavior and make it explicit: *
> document that '%' must never be marked safe in the caller-provided BitSet
> * reject such a BitSet up front, for example with IllegalArgumentException
> * or keep '%' reserved regardless of the `BitSet` and always emit %25
> This is a public API contract gap and a semantic mismatch between the paired
> encode/decode entry points:
> * encodeUrl(...) can emit a literal '%'
> * decodeUrl(...) always treats '%' as control syntax
> The API exposes a configurable safe-character set but does not tell callers
> that '%' is a reserved value, and it does not reject that configuration
> either. As a result, a supported public entry point accepts input that can
> generate undecodable output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)