[
https://issues.apache.org/jira/browse/DAFFODIL-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Beckerle closed DAFFODIL-1386.
--------------------------------------
Resolution: Won't Fix
Pointless wish list item.
Nobody asking for this. It's not really feasible to fix, closing.
> single utf-8 4-byte character becomes surrogate character pairs in scala/java
> string
> ------------------------------------------------------------------------------------
>
> Key: DAFFODIL-1386
> URL: https://issues.apache.org/jira/browse/DAFFODIL-1386
> Project: Daffodil
> Issue Type: Wish
> Components: Back End
> Reporter: Michael Beckerle
> Priority: Major
>
> Recent changes in 1.2.0 to the data input layers removed a feature which is
> the ability to treat surrogate pair characters as single characters.
> See test_encodingNoError.
> This test has a TDML representation where a single character in utf-8 that
> has a 4-byte encoding has to become a surrogate-pair (two codepoints) in a
> java/scala string, but the data input stream's char iterator on a call to
> next() returns only 1 codepoint. There is no accomodation in the data input
> stream layers for the possibility of a single character needing 2 codepoints.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)