This email is FYI only. You can skip it unless you care about this specific
development topic.
I have made quite a lot of progress this past week so I thought it worth
reporting about given that fixing these issues so that EDIFACT and TLOG can
work has been delayed for so long.
There are numerous JIRA tickets associated with this problem area:
DAFFODIL-1080, DAFFODIL-1976, DAFFODIL-1886, DAFFODIL-1919, DAFFODIL-110
On my branch, separated sequences have been substantially revised, and I hope
to get this into code review as a PR in a few days.
I added a new property daf:emptyElementPolicy intended to control whether
Daffodil implements the DFDL spec, or bends the rules in order to be compatible
with IBM DFDL so that we can run their published DFDL schemas on github.
The status of my daffodil-1080-sep branch is that all tests in daffodil-test
pass.
There are exactly 3 errors in daffodil-test-ibm1
test_AX000
test_ptLax1rt
The above fail because daffodil doesn't implement the behavior where
empty strings are only created for optional string elements when there
is some non-zero-length syntax defined by dfdl:emptyValueDelimiterPolicy
and initiator/terminator.
Daffodil is creating a empty string value here based on just the
presence of a separator, which is incorrect.
When dfdl:separatorSuppressionPolicy is trailingEmpty (or
trailingEmptyStrict), then
this should NOT create an empty string value. It should just tolerate
the separator (or not for trailingEmptyStrict)
test_ptg3_1p_ibm_daf
The above fails because in the new daf:emptyElementPolicy
noEmptyElements mode, daffodil does not cause a
processing error on a required (scalar or required
array element < minOccurs) string element that has empty-string as its
value. This causes a parse error on IBM DFDL, and the daf:emptyElementPolicy
of noEmptyElements is supposed to be compatible with this.
(In addition if a default value is specified, then we need to produce a
runtime SDE, so that this will not backtrack. Also consistent with IBM DFDL
behavior.)
Right now daffodil is creating empty-string elements here. Which it
shouldn't be doing in this compatibility noEmptyElements mode, but in
regular emptyElementPolicy="emptyElements" this would be correct
behavior.
I believe fixing the above will fix several of the regressions on published
DFDL schemas also.
This change set is extensive enough that I also ran all the published DFDL
schemas from DFDLSchemas site on github (and iCalendar as well)
Published Schema Regressions:
iCalendar - now gets a SDE - implicit with unbounded maxOccurs only
allowed on last declared element of sequence. This is not due to my
changes, but a check that has been added recently.
mil-std-2045 - 2 tests fail. One is Terminator 7F not found, the other is
empty children related: expected 5 children got 3. Probably same issue
as identified above for one of the daffodil-test-ibm1 tests.
png - many tests fail. All for same reason: expected 1 child got 0.
Probably same issue as identified above for one of the daffodil-test-ibm1 tests.
(Also bmp - fails with java out of heap space, but that was true of
2.3.0 released version of Daffodil - see DAFFODIL-2118)
Now of course the objective of these separated sequence changes is to get more
published
DFDL schemas to run. Specifically, EDIFACT, and ibm4690-TLog (aka TLOG).
Progress on EDIFACT
* The one test fails for same reason as test_ptg3_1p_ibm_daf, or at least that
is what it is currently clearly failing on. It runs and produces an infoset.
Note: EDIFACT takes like a minute+ to compile the schema. Ugh.
Progress on TLOG
* 2 of 5 tests pass
* 3 others fail - reasons as yet unanalyzed. They run, and produce infosets.
Those infosets
aren't the same as what is expected.
Final point: performance - the unparser for separated sequences with separator
suppression uses some pretty heavy-weight techniques - it creates suspendable
unparsers for the separators that might be suppressed. The performance
implications of this are as yet unexamined. I've been focused on just getting
the behavior to be right first.