[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591691#comment-15591691
 ] 

Hudson commented on COCOON-2352:


SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #117 (See 
[https://builds.apache.org/job/Cocoon%202.1.X/117/])
[COCOON-2352] Third PR applied - This closes #3 (ilgrosso: 
[http://svn.apache.org/viewvc/?view=rev=1765804])
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591607#comment-15591607
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


Further changes committed with [1] (Cocoon 2.1) and [2] (Cocoon XML 
Serializers); 1.0.3-SNAPSHOT redeployed to Maven repo.

Thanks again.

[1] http://svn.apache.org/viewvc?rev=1765804=rev
[2] http://svn.apache.org/viewvc?rev=1765807=rev

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-19 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590441#comment-15590441
 ] 

Ben Fortuna commented on COCOON-2352:
-

My sincerest apologies, but I discovered a bug in the patch I submitted. 
Unfortunately I had assumed we can cast an int to a char to encode the higher 
order unicode characters, but of course this isn't possible and is why unicode 
surrogate pairs exist in the first place..

So I had to make a slight change to the code (again) - I have updated two 
files: XMLEncoder and XMLEncoderTestCase to ensure that after combining a 
surrogate pair to a code point we are then correctly encoding the int value as 
an HTML-compatible string.

https://github.com/apache/cocoon/pull/3/files

Thanks again, and fingers crossed there are no more changes required. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582333#comment-15582333
 ] 

Ben Fortuna commented on COCOON-2352:
-

Great! I've tested the snapshot against my code and it looks good. Many thanks 
for your assistance. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582339#comment-15582339
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


You're welcome ;-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582202#comment-15582202
 ] 

Hudson commented on COCOON-2352:


SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #116 (See 
[https://builds.apache.org/job/Cocoon%202.1.X/116/])
[COCOON-2352] Applying further changes to better deal with HTML encoding 
(ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1765265])
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582183#comment-15582183
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


My bad: problem solved, committed to (Cocoon 2.1)

* 
http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* 
http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java

and (Maven artifact, with SNAPSHOT already redeployed):

* 
http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* 
http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/test/java/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.java

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581450#comment-15581450
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


With the new test code, I receive

java.lang.IllegalArgumentException: Expected low surrogate char
at 
org.apache.cocoon.components.serializers.encoding.XMLEncoder.encode(XMLEncoder.java:97)
at 
org.apache.cocoon.components.serializers.encoding.XMLEncoderTestCase.testEncodingSurrogatePairs(XMLEncoderTestCase.java:42)

when running the test.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580714#comment-15580714
 ] 

Ben Fortuna commented on COCOON-2352:
-

Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574611#comment-15574611
 ] 

Hudson commented on COCOON-2352:


SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #115 (See 
[https://builds.apache.org/job/Cocoon%202.1.X/115/])
[COCOON-2352] This closes #2 (ilgrosso: 
[http://svn.apache.org/viewvc/?view=rev=1764819])
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1557#comment-1557
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


Ben, I have applied your further PR in [1] but I have unfortunately noticed 
later that the test is failing in this assertion:

assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));

Unfortunately, I have noticed this *after* committing to COCOON_2_1_X, but I 
have stopped myself right before deploying the updated SNAPSHOT artifact 
(thanks Maven and the surefire plugin!).

Does your test case need to be updated as well?

[1] http://svn.apache.org/viewvc?rev=1764819=rev

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-13 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574193#comment-15574193
 ] 

Ben Fortuna commented on COCOON-2352:
-

[~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the 
fix in my project and I did notice one more thing.. whilst it does create the 
unicode character correctly from the surrogate pair it doesn't actually HTML 
encode the character. 

In order to fix this I've created another pull request, which simply encodes 
the unicode character created from the surrogate pair:

https://github.com/apache/cocoon/pull/2/files#diff-2b4ac8dab4cdcce4c7ffd948c2490b52R101

I hope it isn't too much trouble to apply this change also, I'm confident this 
is the last change required. Many thanks.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571065#comment-15571065
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


I have reworked your patch to be also applied to the 
org.apache.cocoon:cocoon-serializers-charsets Maven artifact (used by Cocoon 
2.2 and Cocoon 3.0).

I don't know when we will be able to officially release your fix there; in the 
meanwhile, however, you could use the SNAPSHOT artifact by setting the 
following dependency:


org.apache.cocoon
cocoon-serializers-charsets
1.0.3-SNAPSHOT


and adding the following repository to your pom:


  apache.snapshots
  Apache Snapshot Repository
  http://repository.apache.org/snapshots
  
false
  


Alternatively, you can download the updated SNAPSHOT artifact from

https://repository.apache.org/content/groups/snapshots/org/apache/cocoon/cocoon-serializers-charsets/1.0.3-SNAPSHOT/cocoon-serializers-charsets-1.0.3-20161013.064604-1.jar

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-12 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570179#comment-15570179
 ] 

Ben Fortuna commented on COCOON-2352:
-

[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564709#comment-15564709
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


We have just decided to upgrade to 1.5 compatibility in COCOON-2356, so I am 
happy to keep your contribution.
A subsequent build [1] succeeded, in fact.

Can we close this issue, then?

[1] https://builds.apache.org/job/Cocoon%202.1.X/112


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-10 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563730#comment-15563730
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hmm, I guess from that failed build that you are still maintaining 
compatibility with Java 1.4 (Character.isLowSurrogate() was introduced in 1.5). 
I guess we can work around that although I'm not sure anyone is using Java 1.4 
anymore.. ;)


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561631#comment-15561631
 ] 

Hudson commented on COCOON-2352:


FAILURE: Integrated in Jenkins build Cocoon 2.1.X #111 (See 
[https://builds.apache.org/job/Cocoon%202.1.X/111/])
[COCOON-2352] Support for Unicode surrogate pairs - This closes #1 (ilgrosso: 
[http://svn.apache.org/viewvc/?view=rev=1764023])
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* (add) BRANCH_2_1_X/src/blocks/serializers/test
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon
* (add) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components
* (add) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers
* (add) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding
* (add) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561543#comment-15561543
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


Hi [~fortuna], thanks for your PR (which is also the very first coming from 
github, wow...)!

As you can see from [1] (I had to download the PR as diff, then rework it a bit 
to make it compatible with Cocoon 2.1 JUnit tests [2]), your changes are now 
incorporated.
I have also added [3] to properly handle XMLEncoder#highSurrogate 
re-initialization.

Shall we close this issue, then?

[1] http://svn.apache.org/viewvc?view=revision=1764023
[2] http://cocoon.apache.org/2.1/installing/tests.html
[3] 
http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/EncodingSerializer.java?r1=1764023=1764022=1764023

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561538#comment-15561538
 ] 

ASF GitHub Bot commented on COCOON-2352:


Github user asfgit closed the pull request at:

https://github.com/apache/cocoon/pull/1


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-09 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560988#comment-15560988
 ] 

Ben Fortuna commented on COCOON-2352:
-

I've just created a pull request in github to add support for surrogate pairs.

https://github.com/apache/cocoon/pull/1

Summary of changes:

* Added instance variable to XMLEncoder to record the first surrogate of the 
pair - NOTE: this means the XMLEncoder is no longer thread safe. This may have 
implications I'm not aware of (i.e. usage in multi-threaded way)
* Added unit test to demonstrate the behaviour - NOTE: I needed to add the 
serializers project to the test classpath, not sure if there is a better way to 
do this with the ant config.

I look forward to any feedback or comments.

regards,
ben


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560981#comment-15560981
 ] 

ASF GitHub Bot commented on COCOON-2352:


GitHub user benfortuna opened a pull request:

https://github.com/apache/cocoon/pull/1

Support for Unicode surrogate pairs

This PR adds support for encoding surrogate pairs as a single character the 
XMLEncoder implementation. See 
[COCOON-2352](https://issues.apache.org/jira/browse/COCOON-2352) for further 
details.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benfortuna/cocoon BRANCH_2_1_X

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cocoon/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1


commit 4975a555b8330446089c81e17e8bfaaaee669600
Author: Ben Fortuna 
Date:   2016-10-10T00:11:32Z

Added required folder for build

commit cf2d9b65eb55b9d19a0b0c179e90fe7c7b70b6e6
Author: Ben Fortuna 
Date:   2016-10-10T00:11:58Z

Added support for decoding surrogate pairs

commit cc68b0040c5afc6286dc767810ea2ec7abd58340
Author: Ben Fortuna 
Date:   2016-10-10T01:26:20Z

Added unit test for encoding unicode surrogate pairs




> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495631#comment-15495631
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


Understand, thanks for working on this.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495624#comment-15495624
 ] 

Ben Fortuna commented on COCOON-2352:
-

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to 
change the Encoder interface so I'll see if it's possible to just update 
XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs 
to be encoded "together", so the logic really needs to be in the delegate 
encoder (i.e. XMLEncoder).


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495623#comment-15495623
 ] 

Ben Fortuna commented on COCOON-2352:
-

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to 
change the Encoder interface so I'll see if it's possible to just update 
XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs 
to be encoded "together", so the logic really needs to be in the delegate 
encoder (i.e. XMLEncoder).


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495597#comment-15495597
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


XMLEncoder (for Cocoon 2.1) is at

http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
 

while the Encoder interface is at

https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/Encoder.java

As you say above, there are around several implementations of such interface.

Also, have you already taken a look at

https://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/util/EncodingSerializer.java

?


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-15 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495081#comment-15495081
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hi Francesco,

The JAR I am using is: org.apache.cocoon:cocoon-serializers-charsets:1.0.2 - 
which appears to be built in 2012. It looks like it came from the BRANCH_2_1.X 
branch but I can't be certain..

I will try to make a patch - the easiest for me would a pull request on GitHub, 
but if you prefer a patch file I can do that also. 

I am looking at the unit tests in the project and it is a little difficult to 
get my head around. Would you prefer that I write a unit test using htmlunit, 
or junit, or no preference? It appears tests haven't been updated for a number 
of years. Many thanks.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492633#comment-15492633
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


Hi Ben, thanks for reporting.

Just for confirmation: is this bug identified against Cocoon 2.1? Also with 
latest development version available at [1]? (svn checkout from [2]).

Are you willing to provide a patch (possibly including an unit test)?

[1] 
http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
[2] http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-14 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492310#comment-15492310
 ] 

Ben Fortuna commented on COCOON-2352:
-

A possibly less-instrusive approach would be to leave the method signatures as 
is, but when a surrogate char is detected, record it and return an empty char 
array. Expect the second surrogate in the pair to be encoded next and return 
the correct char array result (if second surrogate in the pair isn't encoded 
throw encoding exception).

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-14 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492282#comment-15492282
 ] 

Ben Fortuna commented on COCOON-2352:
-

So I've looked at XMLEncoder, and it seems that the fix will require a change 
to the method signature - specifically XMLEncoder.encode(char c):

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88

Unfortunately this also means the Encoder interface needs to change, so will 
need an exercise to identify what else implements this interface. The proposed 
change would be something like:

public char[] Encoder.encode(char[] chars)

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36

I'm happy to implement a fix and submit a pull request, just looking for some 
acknowledgement of the issue before proceeding.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)