[
https://issues.apache.org/jira/browse/DAFFODIL-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Beckerle updated DAFFODIL-2600:
------------------------------------
Description:
DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be
dependent on environment variables like LANG.
Our diagnostic messages might be affected, but infoset contents and data
contents should not be. So only negative tests which are checking error/warning
messages should be sensitive to environmental things like LANG.
However, positive tests fail if UTF-8 is not properly specified
environmentally. This is a bug because it means somewhere we're getting a
default (environmentally specified) character set encoding, when we should be
specifying the encoding.
In addition, Daffodil does require that systems are setup to enable Unicode. A
clear diagnostic is needed if, when building daffodil, the UTF-8 capabilities
are not properly setup. This otherwise leads to a long list of errors that are
not easily interpreted.
Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the
default, on others some other charset for en_US. A portable check here may be
somewhat challenging, given that different systems have different defaults
(e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.)
We know MS-Windows also requires specific UTF-8 configuration. So likely we
need a test that
(1) runs very early or first, so that the error message isn't lost in the mix
(2) checks that UTF-8 behaviors are working properly for Daffodil, regardless
of how that particular operating system variant must be configured to get those
settings.
was:
A clear diagnostic is needed if, when building daffodil, the UTF-8 capabilities
are not properly setup. This otherwise leads to a long list of errors that are
not easily interpreted.
Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the
default, on others some other charset for en_US. A portable check here may be
somewhat challenging, given that different systems have different defaults
(e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.)
We know MS-Windows also requires specific UTF-8 configuration. So likely we
need a test that
(1) runs very early or first, so that the error message isn't lost in the mix
(2) checks that UTF-8 behaviors are working properly for Daffodil, regardless
of how that particular operating system variant must be configured to get those
settings.
> encoding varies with environment - UTF-8 not properly set somewhere
> -------------------------------------------------------------------
>
> Key: DAFFODIL-2600
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2600
> Project: Daffodil
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: 3.1.0, 3.2.0
> Reporter: Mike Beckerle
> Priority: Major
>
> DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be
> dependent on environment variables like LANG.
> Our diagnostic messages might be affected, but infoset contents and data
> contents should not be. So only negative tests which are checking
> error/warning messages should be sensitive to environmental things like LANG.
> However, positive tests fail if UTF-8 is not properly specified
> environmentally. This is a bug because it means somewhere we're getting a
> default (environmentally specified) character set encoding, when we should be
> specifying the encoding.
> In addition, Daffodil does require that systems are setup to enable Unicode.
> A clear diagnostic is needed if, when building daffodil, the UTF-8
> capabilities are not properly setup. This otherwise leads to a long list of
> errors that are not easily interpreted.
> Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the
> default, on others some other charset for en_US. A portable check here may
> be somewhat challenging, given that different systems have different defaults
> (e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.)
> We know MS-Windows also requires specific UTF-8 configuration. So likely we
> need a test that
> (1) runs very early or first, so that the error message isn't lost in the mix
> (2) checks that UTF-8 behaviors are working properly for Daffodil, regardless
> of how that particular operating system variant must be configured to get
> those settings.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)