[
https://issues.apache.org/jira/browse/DAFFODIL-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guichard Desrosiers updated DAFFODIL-3084:
------------------------------------------
Description:
Daffodil's calendar conversion and comparison code reads the ICU
{{EXTENDED_YEAR}} field for the year value. {{EXTENDED_YEAR}} uses astronomical
(proleptic) numbering — 1 BCE = 0, 2 BCE = -1, etc. — whereas XSD 1.0-year
numbering matches ICU's {{YEAR}} scheme, where 1 BCE = -0001, 2 BCE = -0002,
and there is no year zero. Because the two conventions are offset by one for
all non-positive years, every BCE date Daffodil produces are wrong, and off by
one year. Astronomical year 0 is rendered as the lexically illegal {{{}0000{}}}.
*Proposed Fix:*
Change all uses of {{Calendar.EXTENDED_YEAR}} to {{Calendar.YEAR}} across the
calendar conversion and comparison code, so the lexical year matches XSD 1.0
numbering. As a consequence, year 0 is unrepresentable ({{{}YEAR{}}} minimum is
1), which matches XSD 1.0 (no year zero) and structurally guarantees Daffodil
never emits {{{}0000{}}}.
Under lax calendar check policy, ICU does not reject a {{0000}} / year-0 input;
it normalizes it to {{{}0001{}}}. This is acceptable: lax is intentionally
permissive, and the key guarantee — that Daffodil never emits {{0000}} in the
infoset — still holds. Such input simply cannot round-trip, since the original
{{0000}} lexical form is not reproduced.
was:
Daffodil's calendar conversion and comparison code reads the ICU
{{EXTENDED_YEAR}} field for the year value. {{EXTENDED_YEAR}} uses astronomical
(proleptic) numbering — 1 BCE = 0, 2 BCE = -1, etc. — whereas XSD 1.0-year
numbering matches ICU's {{YEAR}} scheme, where 1 BCE = -0001, 2 BCE = -0002,
and there is no year zero. Because the two conventions are offset by one for
all non-positive years, every BCE date Daffodil produces are wrong, and off by
one year. Astronomical year 0 is rendered as the lexically illegal {{{}0000{}}}.
*Proposed Fix:*
Change all uses of {{Calendar.EXTENDED_YEAR}} to {{Calendar.YEAR}} across the
calendar conversion _and_ comparison code, so the lexical year matches XSD1.0
numbering. As a consequence, year 0 is unrepresentable ({{{}YEAR{}}} minimum is
1), which matches XSD 1.0 (no year zero) and structurally guarantees Daffodil
never emits {{{}0000{}}}. ICU under lax rejects a {{0000}} / year-0 lexical
value outright once on {{{}YEAR{}}}. Daffodil should surface this as a *parse
error* (a clean processing-error diagnostic), not an uncaught exception — i.e.
catch ICU's rejection in the calendar parse path and convert it to a normal
Daffodil parse diagnostic.
> Calendar code uses ICU EXTENDED_YEAR instead of YEAR, producing year values
> that don't conform to the XSD 1.0 spec
> -------------------------------------------------------------------------------------------------------------------
>
> Key: DAFFODIL-3084
> URL: https://issues.apache.org/jira/browse/DAFFODIL-3084
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Reporter: Guichard Desrosiers
> Assignee: Guichard Desrosiers
> Priority: Major
>
> Daffodil's calendar conversion and comparison code reads the ICU
> {{EXTENDED_YEAR}} field for the year value. {{EXTENDED_YEAR}} uses
> astronomical (proleptic) numbering — 1 BCE = 0, 2 BCE = -1, etc. — whereas
> XSD 1.0-year numbering matches ICU's {{YEAR}} scheme, where 1 BCE = -0001, 2
> BCE = -0002, and there is no year zero. Because the two conventions are
> offset by one for all non-positive years, every BCE date Daffodil produces
> are wrong, and off by one year. Astronomical year 0 is rendered as the
> lexically illegal {{{}0000{}}}.
>
> *Proposed Fix:*
> Change all uses of {{Calendar.EXTENDED_YEAR}} to {{Calendar.YEAR}} across the
> calendar conversion and comparison code, so the lexical year matches XSD 1.0
> numbering. As a consequence, year 0 is unrepresentable ({{{}YEAR{}}} minimum
> is 1), which matches XSD 1.0 (no year zero) and structurally guarantees
> Daffodil never emits {{{}0000{}}}.
> Under lax calendar check policy, ICU does not reject a {{0000}} / year-0
> input; it normalizes it to {{{}0001{}}}. This is acceptable: lax is
> intentionally permissive, and the key guarantee — that Daffodil never emits
> {{0000}} in the infoset — still holds. Such input simply cannot round-trip,
> since the original {{0000}} lexical form is not reproduced.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)