Perhaps this is so invalid character streams (e.g. mismatched or orphaned 
surrogate pairs) can survive encoding and decoding (I haven't tested)? 
Strictly speaking not every CharacterSequence is validly encode-able to 
utf-8. Java just kind of hides this. For example, this is a reversed 
surrogate pair (or two orphaned surrogates, take your pick):

(mapv #(Integer/toHexString (int %)) (String. (.getBytes "\uDC00\uD800" 
"UTF-8") "UTF-8"))
=> ["3f" "3f"]

Note that Java's utf-8 encoder will translate these to "?", losing 
information about the original char value.

That said, if this is the case, it makes more sense for fressian to say "we 
have a custom encoding that is mostly utf-8 except it preserves invalid 
utf-16" than "this is utf-8". I wonder if other fressian implementations 
handle this the same way? Javascript also shares java's utf-16 string type 
but not every platform does.


On Thursday, November 7, 2019 at 6:51:40 AM UTC-6, Kyle Wilt wrote:
>
> I posted an issue about this to the datomic/fressian github page but I 
> don't know if anyone is monitoring it anymore.
>
> https://github.com/Datomic/fressian/issues/7
>
> I'm trying to find out if this is intentional for some reason or a bug. 
> Right now it encodes UTF16 surrogate pairs as two 3 byte values for 10FFFF 
> rather 
> than one 4 byte value as expected.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/7c07a8c4-6674-495a-b96c-a95505875b53%40googlegroups.com.

Reply via email to