[ 
https://issues.apache.org/jira/browse/DAFFODIL-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Beckerle reassigned DAFFODIL-1979:
------------------------------------------

    Assignee: Michael Beckerle

> UTF8 decoder doesn't handle 3-byte and 4-byte correctly
> -------------------------------------------------------
>
>                 Key: DAFFODIL-1979
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1979
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 2.2.0
>            Reporter: Michael Beckerle
>            Assignee: Michael Beckerle
>            Priority: Major
>             Fix For: 2.2.0
>
>
> It is classifying some valid characters as "overlong" and erroring out.
> The PNG schema on DFDLSchemas github has 1 test that runs into this bug on 3 
> byte Devangari script characters.
> This is 6 devangari characters: e0 a4 b6 e0 a5 80 e0 a4 b0 e0 a5 8d e0 a4 b7 
> e0 a4 95
> Should be: शीर्षक
> But is coming out all substitution chars.
> In 3 byte utf-8, the bits that at least one of must be non-zero are shown 
> here in M, notice one of them is in the second byte. This second byte wasn't 
> being tested.
> 1110MMMM 10Mxxxxx 10xxxxxx
> In 4 byte utf-8, the bits that must at least one of be non-zero are:
> 11110 MMM 10MMxxxx 10xxxxxx 10xxxxxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to