Re: [2.1] Overzealous escaping of high Unicode code points

2017-06-20 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Greg,

On 6/20/17 4:11 PM, Christopher Schultz wrote:
> Greg,
> 
> On 6/8/17 2:17 PM, gelo1234 wrote:
>> Chris,
> 
>> Even with C3 (cocoon 3.0 beta) unless you specify optional
>> encoding in your Serializer config, you fallback to default
>> UTF-8:
> 
>> org.apache.cocoon.optional.servlet.components.sax.serializers.util
>
>>  public class ConfigurationUtils {
> 
>> private ConfigurationUtils() { }
> 
>> public static String getEncoding(Map 
>> configuration) { String encoding = (String) 
>> configuration.get("encoding");
> 
>> if (encoding == null || "".equals(encoding)) { encoding =
>> "UTF-8"; }
> 
>> return encoding; } ...
> 
> I would have expected the Unicode codepoint to be converted into a 
> single 4-byte UTF-8 byte without any &-encoding at all. It looks
> like what I got was a pair of 2-byte characters with &-encoding.
> 
> I'll try UTF-16 but my expectation is that it's going to get
> worse, not better.

Interestingly enough, my emojis are now showing (which I don't totally
understand why!) but it looks like my CSS aren't being loaded. That's
a separate problem I'll have to figure out for myself.

In my own application, switching from commons-lang to commans-lang3
HTML/XML escaping allowed me to use these 4-byte emojis and UTF-8
together. I'm surprised that Cocoon can't do the same thing. (I think
it comes down to exactly how the character-escaper makes its decisions).

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAllJgiwACgkQHPApP6U8
pFgJkRAAqiXn7DWNDN41m1V98aI5xWjTuoka0tKcadN1IUGemTZwipaXHtYQcois
6yuI3st31ZuanghIpRPcBu9pZzuHtOSBVSHZSIhDGqPwYgczScQ2LgnfMi6zwAdd
j2LFlSWtKGjgCczV5Ok56PyMq1BEAOVw96vmF5xfXmpLAyNA/PvLKsncoW4pN+ES
1MQMm1aPwbmEpWz7ykReUzfauwBtL4rEX1wO3pl88m9Wq3x174AKHWs/a+4Z1Hdq
0CnxfrdTK50p7Ng+ECfnPwx8y1Em64lA7KKMuz2jTd0PnxlpZTAgO6lq8S7BdSeY
H1lwBJojVT/+m2w8b9OC/XoyiAyiC/zIswQ3TSMA3ZC2SnCxxAXMTsmT49Ql+lyq
01JRCIVMitKeoKI4I4066oaBW91FpSSpZXX14XCHrMBtKnIJI+NxBnI++eQq8wdi
ZdX3GzLF2zaPHvZMSz4DRskR1xKGLsAxZAukINW3AGrEAZ/GwbPd76ml3YJam5Yy
R31u0kcRJl4z79pd1n46yxB66V10Rn5IkSMQ8R7uK/ht9wLi5T8bkeAoLjZFFoyq
awmfQTbJzquXAtwjX99WKWEzviN2ph+P0h2rBInHnos5ud8IlLjcS7FmdxQ4DNOw
Nirmj7cikxcr2Fn22pGQh6o3/Eph0lMf1d1HjUZ1C7SchEgsqrk=
=0nTd
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: [2.1] Overzealous escaping of high Unicode code points

2017-06-20 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Greg,

On 6/8/17 2:17 PM, gelo1234 wrote:
> Chris,
> 
> Even with C3 (cocoon 3.0 beta) unless you specify optional encoding
> in your Serializer config, you fallback to default UTF-8:
> 
> org.apache.cocoon.optional.servlet.components.sax.serializers.util
> 
> public class ConfigurationUtils {
> 
> private ConfigurationUtils() { }
> 
> public static String getEncoding(Map 
> configuration) { String encoding = (String)
> configuration.get("encoding");
> 
> if (encoding == null || "".equals(encoding)) { encoding = "UTF-8"; 
> }
> 
> return encoding; } ...

I would have expected the Unicode codepoint to be converted into a
single 4-byte UTF-8 byte without any &-encoding at all. It looks like
what I got was a pair of 2-byte characters with &-encoding.

I'll try UTF-16 but my expectation is that it's going to get worse,
not better.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAllJgYoACgkQHPApP6U8
pFjKCg//UXuln4vSZ4bw32OVWRlsLnfm9RcOjiuDb+DqKjfTTqdIY1kdLyZQK+o4
Y8n12ct3sHQRdsViULtm9dhOClF+6qBXFgbjKO9ya6v4WvWeC4NOh0HK+nFlmvqA
1fNjTuc4orDgDl5npt+6Co8LprToPKBJlF7Vq+dvgLbiYJHh4lTrgAQuyY7YCXoC
BUJAieW/ntPficv6q/Tm0g32N/pBnLYArJd3ncwxIZyEYt4jX6tMsPZNwqaY2HrE
+D1nc5jTfMnx7B9WH3W5MMw0t4VxiwE2KbK88oHSUf6IV/Nok/5EfMNefQSZr71Z
gtxvFRld8Lim/YYMgFieAHXFP5axE81Bk7Z76lj9jOK7YcOMFUPYST63JVv0uVUZ
urIEwf5FBEiW/264YTESUfOuPWsbuQQ9x23FRFKh2HiZJmN0afp7uJrkLK55XCT/
OAn6h9wcAtch4idney8BWkLfMOtdHTTaY5PzZRc1EpWDZk4jYYyD+2sdjnHD21Ka
CmwKkwnA9WDTJ5owD6n5lIZpYaPBGqFRaCcwWYQtERUA7ZrmBvI7GbuSvfLA3CDp
H0nO97fOd2s+IXlxno73V9B7Kvj56CKxP2O5OoXgQHl6b2J+z9ZZ16l83beEblNS
5HWxQSvFw2FjLqhSSQOOsLvkIjWLL/tpBSWq4XEH1iVxViFGJvk=
=KIbJ
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org