[ 
https://issues.apache.org/jira/browse/DAFFODIL-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Beckerle updated DAFFODIL-2600:
------------------------------------
    Description: 
DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be 
dependent on environment variables like LANG.

Our diagnostic messages might be affected, but infoset contents and data 
contents should not be. So only negative tests which are checking error/warning 
messages should be sensitive to environmental things like LANG. 

However, positive tests fail if UTF-8 is not properly specified 
environmentally. This is a bug because it means somewhere we're getting a 
default (environmentally specified) character set encoding, when we should be 
specifying the encoding. 

In addition, Daffodil does require that systems are setup to enable Unicode.  A 
clear diagnostic is needed if, when building daffodil, the UTF-8 capabilities 
are not properly setup. This otherwise leads to a long list of errors that are 
not easily interpreted.

Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the 
default, on others some other charset for en_US.  A portable check here may be 
somewhat challenging, given that different systems have different defaults 
(e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.) 
We know MS-Windows also requires specific UTF-8 configuration. So likely we 
need a test that

(1) runs very early or first, so that the error message isn't lost in the mix

(2) checks that UTF-8 behaviors are working properly for Daffodil, regardless 
of how that particular operating system variant must be configured to get those 
settings. 

 

  was:
A clear diagnostic is needed if, when building daffodil, the UTF-8 capabilities 
are not properly setup. This otherwise leads to a long list of errors that are 
not easily interpreted.

Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the 
default, on others some other charset for en_US.  A portable check here may be 
somewhat challenging, given that different systems have different defaults 
(e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.) 
We know MS-Windows also requires specific UTF-8 configuration. So likely we 
need a test that

(1) runs very early or first, so that the error message isn't lost in the mix

(2) checks that UTF-8 behaviors are working properly for Daffodil, regardless 
of how that particular operating system variant must be configured to get those 
settings. 

 


> encoding varies with environment - UTF-8 not properly set somewhere
> -------------------------------------------------------------------
>
>                 Key: DAFFODIL-2600
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2600
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: Mike Beckerle
>            Priority: Major
>
> DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be 
> dependent on environment variables like LANG.
> Our diagnostic messages might be affected, but infoset contents and data 
> contents should not be. So only negative tests which are checking 
> error/warning messages should be sensitive to environmental things like LANG. 
> However, positive tests fail if UTF-8 is not properly specified 
> environmentally. This is a bug because it means somewhere we're getting a 
> default (environmentally specified) character set encoding, when we should be 
> specifying the encoding. 
> In addition, Daffodil does require that systems are setup to enable Unicode.  
> A clear diagnostic is needed if, when building daffodil, the UTF-8 
> capabilities are not properly setup. This otherwise leads to a long list of 
> errors that are not easily interpreted.
> Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the 
> default, on others some other charset for en_US.  A portable check here may 
> be somewhat challenging, given that different systems have different defaults 
> (e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.) 
> We know MS-Windows also requires specific UTF-8 configuration. So likely we 
> need a test that
> (1) runs very early or first, so that the error message isn't lost in the mix
> (2) checks that UTF-8 behaviors are working properly for Daffodil, regardless 
> of how that particular operating system variant must be configured to get 
> those settings. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to