[
https://issues.apache.org/jira/browse/DAFFODIL-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dave Thompson closed DAFFODIL-2600.
-----------------------------------
Verified the specified commit (commit b4f9cec512783920a78cdfbfc4610e9e10a11e5f)
is included in the latest pull from the daffodil repository.
Verified changes identified in commit comment were implemented.
Verified affected daffodil subproject sbt test suites execute successfully
including the modified tests.
Rolled commit back to prior to fix and verified daffodil-test sbt tests pass
with $LANG set to us-en.UTF-8. Reset $LANG to US-ASCII and verified some
daffodil-test sbt tests fails comparison. Pulled fix commit and verified
daffodil-test sbt tests now pass with $LANG set to US-ASCII.
Verified the nightly test schemas compile and save successfully.
Verified the nightly test suite executes successfully.
> encoding varies with environment - UTF-8 not properly set somewhere
> -------------------------------------------------------------------
>
> Key: DAFFODIL-2600
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2600
> Project: Daffodil
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: 3.1.0, 3.2.0
> Reporter: Mike Beckerle
> Assignee: Steve Lawrence
> Priority: Major
> Fix For: 3.3.0
>
>
> DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be
> dependent on environment variables like LANG.
> Our diagnostic messages might be affected, but infoset contents and data
> contents should not be. So only negative tests which are checking
> error/warning messages should be sensitive to environmental things like LANG.
> However, positive tests fail if UTF-8 is not properly specified
> environmentally. This is a bug because it means somewhere we're getting a
> default (environmentally specified) character set encoding, when we should be
> specifying the encoding.
> In addition, Daffodil does require that systems are setup to enable Unicode.
> A clear diagnostic is needed if, when building daffodil, the UTF-8
> capabilities are not properly setup. This otherwise leads to a long list of
> errors that are not easily interpreted.
> Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the
> default, on others some other charset for en_US. A portable check here may
> be somewhat challenging, given that different systems have different defaults
> (e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.)
> We know MS-Windows also requires specific UTF-8 configuration. So likely we
> need a test that
> (1) runs very early or first, so that the error message isn't lost in the mix
> (2) checks that UTF-8 behaviors are working properly for Daffodil, regardless
> of how that particular operating system variant must be configured to get
> those settings.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)