[
https://issues.apache.org/jira/browse/CODEC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruiqi Dong updated CODEC-338:
-----------------------------
Description: I saw the editor here always show me the wrong view (was:
*Summary*
When `PercentCodec` is configured with `plusForSpace = true`, it encodes spaces
as `'+'` but does not percent-escape an existing literal `'+'`.This makes the
encoding ambiguous: * original `'+'` is emitted as `'+'`
* original space is also emitted as `'+'`
During decoding, both are converted back to spaces, so `decode(encode(x))` is
no longer a round-trip for inputs containing `'+'`.
*Affected code*
File: src/main/java/org/apache/commons/codec/net/PercentCodec.java
{code:java}
if (b == ESCAPE_CHAR) {
...
} else if (plusForSpace && b == '+') {
buffer.put((byte) ' ');
} else {
buffer.put(b);
} {code}
{code:java}
if (willEncode && canEncode(b)) {
...
} else if (plusForSpace && b == ' ') {
buffer.put((byte) '+');
} else {
buffer.put(b);
} {code}
The decoder treats '{+}' as a space when plusForSpace is enabled, but the
encoder does not force a literal '{+}' through %2B.
*Reproducer*
Add the following test to
src/test/java/org/apache/commons/codec/net/PercentCodecTest.java:
{code:java}
@Test
void testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace() throws Exception
{
final String input = "a+b c";
final PercentCodec percentCodec = new PercentCodec(null, true);
final byte[] encoded =
percentCodec.encode(input.getBytes(StandardCharsets.UTF_8));
final String encodedS = new String(encoded, StandardCharsets.UTF_8);
assertEquals("a%2Bb+c", encodedS, "Literal plus must be escaped when
plusForSpace is enabled");
final byte[] decode = percentCodec.decode(encoded);
assertEquals(input, new String(decode, StandardCharsets.UTF_8),
"Round-trip with literal plus should preserve the original input");
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.net.PercentCodecTest#testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace
test {code}
Observed behavior:
{code:java}
expected: <a%2Bb+c> but was: <a+b+c> {code}
The test fails on the encoding assertion, because the encoded form is 'a+b+c',
decoding it yields 'a b c', so the original literal '+' is lost.
Expected behavior:
When plusForSpace is enabled:
* spaces may be emitted as '+'
* a literal '+' must be emitted as '%2B
That preserves round-trip behavior and keeps the encoded representation
unambiguous.
This is a public API semantic inconsistency inside the same codec
configuration. The encoder and decoder disagree on the meaning of '{+}':{+}
*
decoder: `'+'` means space
*
encoder: `'+'` may mean either original `'+'` or original space
As a result, PercentCodec corrupts valid input containing plus signs under a
documented configuration mode.
)
> PercentCodec loses literal '+' when plusForSpace is enabled
> -----------------------------------------------------------
>
> Key: CODEC-338
> URL: https://issues.apache.org/jira/browse/CODEC-338
> Project: Commons Codec
> Issue Type: Bug
> Reporter: Ruiqi Dong
> Priority: Major
>
> I saw the editor here always show me the wrong view
--
This message was sent by Atlassian Jira
(v8.20.10#820010)