[ 
https://issues.apache.org/jira/browse/CODEC-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruiqi Dong updated CODEC-338:
-----------------------------
    Description: 
*Summary*
When PercentCodec is configured with plusForSpace = true, it encodes spaces as 
'+' but does not percent-escape an existing literal '+'. This makes the 
encoding ambiguous:
 * original '+' is emitted as '+'
 * original space is also emitted as '+'

During decoding, both are converted back to spaces, so decode(encode(x)) is no 
longer a round-trip for inputs containing '+'.

 
*Affected code*
File: src/main/java/org/apache/commons/codec/net/PercentCodec.java
{code:java}
if (b == ESCAPE_CHAR) {
    ...
} else if (plusForSpace && b == '+') {
    buffer.put((byte) ' ');
} else {
    buffer.put(b);
} {code}
{code:java}
if (willEncode && canEncode(b)) {
    ...
} else if (plusForSpace && b == ' ') {
    buffer.put((byte) '+');
} else {
    buffer.put(b);
} {code}
The decoder treats '{+}' as a space when plusForSpace is enabled, but the 
encoder does not force a literal '{+}' through %2B.
*Reproducer*
Add the following test to 
src/test/java/org/apache/commons/codec/net/PercentCodecTest.java:
{code:java}
@Test
void testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace() throws Exception 
{
    final String input = "a+b c";
    final PercentCodec percentCodec = new PercentCodec(null, true);
    final byte[] encoded = 
percentCodec.encode(input.getBytes(StandardCharsets.UTF_8));
    final String encodedS = new String(encoded, StandardCharsets.UTF_8);
    assertEquals("a%2Bb+c", encodedS, "Literal plus must be escaped when 
plusForSpace is enabled");
    final byte[] decode = percentCodec.decode(encoded);
    assertEquals(input, new String(decode, StandardCharsets.UTF_8),
            "Round-trip with literal plus should preserve the original input");
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.codec.net.PercentCodecTest#testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace
 test {code}
Observed behavior:
{code:java}
expected: <a%2Bb+c> but was: <a+b+c> {code}
The test fails on the encoding assertion, because the encoded form is 'a+b+c', 
decoding it yields 'a b c', so the original literal '+' is lost.
 
Expected behavior:
When plusForSpace is enabled:
 * spaces may be emitted as '+'
 * a literal '+' must be emitted as '%2B

That preserves round-trip behavior and keeps the encoded representation 
unambiguous.
 
 
This is a public API semantic inconsistency inside the same codec 
configuration. The encoder and decoder disagree on the meaning of '{+}': * 
decoder: '{+}' means space
 * encoder: '{+}' may mean either original '{+}' or original space

As a result, PercentCodec corrupts valid input containing plus signs under a 
documented configuration mode.
 

  was:
*Summary*
When PercentCodec is configured with plusForSpace = true, it encodes spaces as 
'+' but does not percent-escape an existing literal '+'. This makes the 
encoding ambiguous: * original '+' is emitted as '+'
 * original space is also emitted as '+'

During decoding, both are converted back to spaces, so decode(encode(x)) is no 
longer a round-trip for inputs containing '+'.

 
*Affected code*
File: src/main/java/org/apache/commons/codec/net/PercentCodec.java
{code:java}
if (b == ESCAPE_CHAR) {
    ...
} else if (plusForSpace && b == '+') {
    buffer.put((byte) ' ');
} else {
    buffer.put(b);
} {code}
{code:java}
if (willEncode && canEncode(b)) {
    ...
} else if (plusForSpace && b == ' ') {
    buffer.put((byte) '+');
} else {
    buffer.put(b);
} {code}
The decoder treats '+' as a space when plusForSpace is enabled, but the encoder 
does not force a literal '+' through %2B.
*Reproducer*
Add the following test to 
src/test/java/org/apache/commons/codec/net/PercentCodecTest.java:
{code:java}
@Test
void testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace() throws Exception 
{
    final String input = "a+b c";
    final PercentCodec percentCodec = new PercentCodec(null, true);
    final byte[] encoded = 
percentCodec.encode(input.getBytes(StandardCharsets.UTF_8));
    final String encodedS = new String(encoded, StandardCharsets.UTF_8);
    assertEquals("a%2Bb+c", encodedS, "Literal plus must be escaped when 
plusForSpace is enabled");
    final byte[] decode = percentCodec.decode(encoded);
    assertEquals(input, new String(decode, StandardCharsets.UTF_8),
            "Round-trip with literal plus should preserve the original input");
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.codec.net.PercentCodecTest#testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace
 test {code}
Observed behavior:
{code:java}
expected: <a%2Bb+c> but was: <a+b+c> {code}
The test fails on the encoding assertion, because the encoded form is 'a+b+c', 
decoding it yields 'a b c', so the original literal '+' is lost.
 
Expected behavior:
When plusForSpace is enabled: * spaces may be emitted as '+'
 * a literal '+' must be emitted as '%2B

That preserves round-trip behavior and keeps the encoded representation 
unambiguous.
 
 
This is a public API semantic inconsistency inside the same codec 
configuration. The encoder and decoder disagree on the meaning of '+': * 
decoder: '+' means space
 * encoder: '+' may mean either original '+' or original space

As a result, PercentCodec corrupts valid input containing plus signs under a 
documented configuration mode.
 


> PercentCodec loses literal '+' when plusForSpace is enabled
> -----------------------------------------------------------
>
>                 Key: CODEC-338
>                 URL: https://issues.apache.org/jira/browse/CODEC-338
>             Project: Commons Codec
>          Issue Type: Bug
>            Reporter: Ruiqi Dong
>            Priority: Major
>
> *Summary*
> When PercentCodec is configured with plusForSpace = true, it encodes spaces 
> as '+' but does not percent-escape an existing literal '+'. This makes the 
> encoding ambiguous:
>  * original '+' is emitted as '+'
>  * original space is also emitted as '+'
> During decoding, both are converted back to spaces, so decode(encode(x)) is 
> no longer a round-trip for inputs containing '+'.
>  
> *Affected code*
> File: src/main/java/org/apache/commons/codec/net/PercentCodec.java
> {code:java}
> if (b == ESCAPE_CHAR) {
>     ...
> } else if (plusForSpace && b == '+') {
>     buffer.put((byte) ' ');
> } else {
>     buffer.put(b);
> } {code}
> {code:java}
> if (willEncode && canEncode(b)) {
>     ...
> } else if (plusForSpace && b == ' ') {
>     buffer.put((byte) '+');
> } else {
>     buffer.put(b);
> } {code}
> The decoder treats '{+}' as a space when plusForSpace is enabled, but the 
> encoder does not force a literal '{+}' through %2B.
> *Reproducer*
> Add the following test to 
> src/test/java/org/apache/commons/codec/net/PercentCodecTest.java:
> {code:java}
> @Test
> void testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace() throws 
> Exception {
>     final String input = "a+b c";
>     final PercentCodec percentCodec = new PercentCodec(null, true);
>     final byte[] encoded = 
> percentCodec.encode(input.getBytes(StandardCharsets.UTF_8));
>     final String encodedS = new String(encoded, StandardCharsets.UTF_8);
>     assertEquals("a%2Bb+c", encodedS, "Literal plus must be escaped when 
> plusForSpace is enabled");
>     final byte[] decode = percentCodec.decode(encoded);
>     assertEquals(input, new String(decode, StandardCharsets.UTF_8),
>             "Round-trip with literal plus should preserve the original 
> input");
> } {code}
> Run:
> {code:java}
> mvn -q 
> -Dtest=org.apache.commons.codec.net.PercentCodecTest#testPercentEncoderDecoderWithLiteralPlusAndPlusForSpace
>  test {code}
> Observed behavior:
> {code:java}
> expected: <a%2Bb+c> but was: <a+b+c> {code}
> The test fails on the encoding assertion, because the encoded form is 
> 'a+b+c', decoding it yields 'a b c', so the original literal '+' is lost.
>  
> Expected behavior:
> When plusForSpace is enabled:
>  * spaces may be emitted as '+'
>  * a literal '+' must be emitted as '%2B
> That preserves round-trip behavior and keeps the encoded representation 
> unambiguous.
>  
>  
> This is a public API semantic inconsistency inside the same codec 
> configuration. The encoder and decoder disagree on the meaning of '{+}': * 
> decoder: '{+}' means space
>  * encoder: '{+}' may mean either original '{+}' or original space
> As a result, PercentCodec corrupts valid input containing plus signs under a 
> documented configuration mode.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to