[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591691#comment-15591691 ] Hudson commented on COCOON-2352: SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #117 (See [https://builds.apache.org/job/Cocoon%202.1.X/117/]) [COCOON-2352] Third PR applied - This closes #3 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1765804]) * (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * (edit) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591607#comment-15591607 ] Francesco Chicchiriccò commented on COCOON-2352: Further changes committed with [1] (Cocoon 2.1) and [2] (Cocoon XML Serializers); 1.0.3-SNAPSHOT redeployed to Maven repo. Thanks again. [1] http://svn.apache.org/viewvc?rev=1765804=rev [2] http://svn.apache.org/viewvc?rev=1765807=rev > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590441#comment-15590441 ] Ben Fortuna commented on COCOON-2352: - My sincerest apologies, but I discovered a bug in the patch I submitted. Unfortunately I had assumed we can cast an int to a char to encode the higher order unicode characters, but of course this isn't possible and is why unicode surrogate pairs exist in the first place.. So I had to make a slight change to the code (again) - I have updated two files: XMLEncoder and XMLEncoderTestCase to ensure that after combining a surrogate pair to a code point we are then correctly encoding the int value as an HTML-compatible string. https://github.com/apache/cocoon/pull/3/files Thanks again, and fingers crossed there are no more changes required. :-) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582333#comment-15582333 ] Ben Fortuna commented on COCOON-2352: - Great! I've tested the snapshot against my code and it looks good. Many thanks for your assistance. :-) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582339#comment-15582339 ] Francesco Chicchiriccò commented on COCOON-2352: You're welcome ;-) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582202#comment-15582202 ] Hudson commented on COCOON-2352: SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #116 (See [https://builds.apache.org/job/Cocoon%202.1.X/116/]) [COCOON-2352] Applying further changes to better deal with HTML encoding (ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1765265]) * (edit) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582183#comment-15582183 ] Francesco Chicchiriccò commented on COCOON-2352: My bad: problem solved, committed to (Cocoon 2.1) * http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java and (Maven artifact, with SNAPSHOT already redeployed): * http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/test/java/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147 ] Ben Fortuna commented on COCOON-2352: - Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: {code} char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); {code} > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581450#comment-15581450 ] Francesco Chicchiriccò commented on COCOON-2352: With the new test code, I receive java.lang.IllegalArgumentException: Expected low surrogate char at org.apache.cocoon.components.serializers.encoding.XMLEncoder.encode(XMLEncoder.java:97) at org.apache.cocoon.components.serializers.encoding.XMLEncoderTestCase.testEncodingSurrogatePairs(XMLEncoderTestCase.java:42) when running the test. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580714#comment-15580714 ] Ben Fortuna commented on COCOON-2352: - Yes sorry, I forgot to mention I had updated the unit test also. See the same PR for the changes. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574611#comment-15574611 ] Hudson commented on COCOON-2352: SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #115 (See [https://builds.apache.org/job/Cocoon%202.1.X/115/]) [COCOON-2352] This closes #2 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1764819]) * (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1557#comment-1557 ] Francesco Chicchiriccò commented on COCOON-2352: Ben, I have applied your further PR in [1] but I have unfortunately noticed later that the test is failing in this assertion: assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); Unfortunately, I have noticed this *after* committing to COCOON_2_1_X, but I have stopped myself right before deploying the updated SNAPSHOT artifact (thanks Maven and the surefire plugin!). Does your test case need to be updated as well? [1] http://svn.apache.org/viewvc?rev=1764819=rev > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574193#comment-15574193 ] Ben Fortuna commented on COCOON-2352: - [~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the fix in my project and I did notice one more thing.. whilst it does create the unicode character correctly from the surrogate pair it doesn't actually HTML encode the character. In order to fix this I've created another pull request, which simply encodes the unicode character created from the surrogate pair: https://github.com/apache/cocoon/pull/2/files#diff-2b4ac8dab4cdcce4c7ffd948c2490b52R101 I hope it isn't too much trouble to apply this change also, I'm confident this is the last change required. Many thanks. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571065#comment-15571065 ] Francesco Chicchiriccò commented on COCOON-2352: I have reworked your patch to be also applied to the org.apache.cocoon:cocoon-serializers-charsets Maven artifact (used by Cocoon 2.2 and Cocoon 3.0). I don't know when we will be able to officially release your fix there; in the meanwhile, however, you could use the SNAPSHOT artifact by setting the following dependency: org.apache.cocoon cocoon-serializers-charsets 1.0.3-SNAPSHOT and adding the following repository to your pom: apache.snapshots Apache Snapshot Repository http://repository.apache.org/snapshots false Alternatively, you can download the updated SNAPSHOT artifact from https://repository.apache.org/content/groups/snapshots/org/apache/cocoon/cocoon-serializers-charsets/1.0.3-SNAPSHOT/cocoon-serializers-charsets-1.0.3-20161013.064604-1.jar > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570179#comment-15570179 ] Ben Fortuna commented on COCOON-2352: - [~ilgrosso] I am happy to have this issue closed, however it would be good if there was a snapshot JAR available to verify the functionality. Specifically I am hoping this change will make it into this artefact: http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22 Will a new version be produced with the next release? Many thanks for your efforts. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564709#comment-15564709 ] Francesco Chicchiriccò commented on COCOON-2352: We have just decided to upgrade to 1.5 compatibility in COCOON-2356, so I am happy to keep your contribution. A subsequent build [1] succeeded, in fact. Can we close this issue, then? [1] https://builds.apache.org/job/Cocoon%202.1.X/112 > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563730#comment-15563730 ] Ben Fortuna commented on COCOON-2352: - Hmm, I guess from that failed build that you are still maintaining compatibility with Java 1.4 (Character.isLowSurrogate() was introduced in 1.5). I guess we can work around that although I'm not sure anyone is using Java 1.4 anymore.. ;) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561631#comment-15561631 ] Hudson commented on COCOON-2352: FAILURE: Integrated in Jenkins build Cocoon 2.1.X #111 (See [https://builds.apache.org/job/Cocoon%202.1.X/111/]) [COCOON-2352] Support for Unicode surrogate pairs - This closes #1 (ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1764023]) * (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java * (edit) BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * (add) BRANCH_2_1_X/src/blocks/serializers/test * (add) BRANCH_2_1_X/src/blocks/serializers/test/org * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding * (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561543#comment-15561543 ] Francesco Chicchiriccò commented on COCOON-2352: Hi [~fortuna], thanks for your PR (which is also the very first coming from github, wow...)! As you can see from [1] (I had to download the PR as diff, then rework it a bit to make it compatible with Cocoon 2.1 JUnit tests [2]), your changes are now incorporated. I have also added [3] to properly handle XMLEncoder#highSurrogate re-initialization. Shall we close this issue, then? [1] http://svn.apache.org/viewvc?view=revision=1764023 [2] http://cocoon.apache.org/2.1/installing/tests.html [3] http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java?r1=1764023=1764022=1764023 > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561538#comment-15561538 ] ASF GitHub Bot commented on COCOON-2352: Github user asfgit closed the pull request at: https://github.com/apache/cocoon/pull/1 > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560988#comment-15560988 ] Ben Fortuna commented on COCOON-2352: - I've just created a pull request in github to add support for surrogate pairs. https://github.com/apache/cocoon/pull/1 Summary of changes: * Added instance variable to XMLEncoder to record the first surrogate of the pair - NOTE: this means the XMLEncoder is no longer thread safe. This may have implications I'm not aware of (i.e. usage in multi-threaded way) * Added unit test to demonstrate the behaviour - NOTE: I needed to add the serializers project to the test classpath, not sure if there is a better way to do this with the ant config. I look forward to any feedback or comments. regards, ben > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560981#comment-15560981 ] ASF GitHub Bot commented on COCOON-2352: GitHub user benfortuna opened a pull request: https://github.com/apache/cocoon/pull/1 Support for Unicode surrogate pairs This PR adds support for encoding surrogate pairs as a single character the XMLEncoder implementation. See [COCOON-2352](https://issues.apache.org/jira/browse/COCOON-2352) for further details. You can merge this pull request into a Git repository by running: $ git pull https://github.com/benfortuna/cocoon BRANCH_2_1_X Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cocoon/pull/1.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1 commit 4975a555b8330446089c81e17e8bfaaaee669600 Author: Ben FortunaDate: 2016-10-10T00:11:32Z Added required folder for build commit cf2d9b65eb55b9d19a0b0c179e90fe7c7b70b6e6 Author: Ben Fortuna Date: 2016-10-10T00:11:58Z Added support for decoding surrogate pairs commit cc68b0040c5afc6286dc767810ea2ec7abd58340 Author: Ben Fortuna Date: 2016-10-10T01:26:20Z Added unit test for encoding unicode surrogate pairs > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495631#comment-15495631 ] Francesco Chicchiriccò commented on COCOON-2352: Understand, thanks for working on this. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495624#comment-15495624 ] Ben Fortuna commented on COCOON-2352: - Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder. I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder). > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495623#comment-15495623 ] Ben Fortuna commented on COCOON-2352: - Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to change the Encoder interface so I'll see if it's possible to just update XMLEncoder. I have looked at the EncodingSerializer, however I think a surrogate pair needs to be encoded "together", so the logic really needs to be in the delegate encoder (i.e. XMLEncoder). > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495597#comment-15495597 ] Francesco Chicchiriccò commented on COCOON-2352: XMLEncoder (for Cocoon 2.1) is at http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java while the Encoder interface is at https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/Encoder.java As you say above, there are around several implementations of such interface. Also, have you already taken a look at https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/util/EncodingSerializer.java ? > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495081#comment-15495081 ] Ben Fortuna commented on COCOON-2352: - Hi Francesco, The JAR I am using is: org.apache.cocoon:cocoon-serializers-charsets:1.0.2 - which appears to be built in 2012. It looks like it came from the BRANCH_2_1.X branch but I can't be certain.. I will try to make a patch - the easiest for me would a pull request on GitHub, but if you prefer a patch file I can do that also. I am looking at the unit tests in the project and it is a little difficult to get my head around. Would you prefer that I write a unit test using htmlunit, or junit, or no preference? It appears tests haven't been updated for a number of years. Many thanks. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492633#comment-15492633 ] Francesco Chicchiriccò commented on COCOON-2352: Hi Ben, thanks for reporting. Just for confirmation: is this bug identified against Cocoon 2.1? Also with latest development version available at [1]? (svn checkout from [2]). Are you willing to provide a patch (possibly including an unit test)? [1] http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java [2] http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/ > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492310#comment-15492310 ] Ben Fortuna commented on COCOON-2352: - A possibly less-instrusive approach would be to leave the method signatures as is, but when a surrogate char is detected, record it and return an empty char array. Expect the second surrogate in the pair to be encoded next and return the correct char array result (if second surrogate in the pair isn't encoded throw encoding exception). > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492282#comment-15492282 ] Ben Fortuna commented on COCOON-2352: - So I've looked at XMLEncoder, and it seems that the fix will require a change to the method signature - specifically XMLEncoder.encode(char c): https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88 Unfortunately this also means the Encoder interface needs to change, so will need an exercise to identify what else implements this interface. The proposed change would be something like: public char[] Encoder.encode(char[] chars) https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36 I'm happy to implement a fix and submit a pull request, just looking for some acknowledgement of the issue before proceeding. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core >Reporter: Ben Fortuna > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)