EmptyGram is created when a grammar is optimized out. Looking through
our code, here's a fairly small example:
private lazy val complexContentSpecifiedLength =
prod("complexContentSpecifiedLength", isComplexType) {
initiatorRegion ~ sharedComplexContentRegion
}
So we have an outer "prod" gram which has a guard of "isComplexType"
defined. If isComplexType evaluates to false, then that prod just
becomes an EmptyGram, and we essentially ignore the sub-grams that make
up this prod. But if it is a complex type, then we evaluate those
sub-grams which likely also have guards on them (e.g. initiatorRegion
probably has a guard for hasInitiator), which could also end up as
EmptyGram or could evaluate to more sub-grams.
In most of these cases, these EmptyGram's should completely disappear.
For example, a grammar of "fooGram ~ EmptyGram" just becomes "fooGram",
and the C code generator should never see the EmptyGram.
However, there are some cases where it might not completely disappear.
One such example is when we have a choice branch where one branch is the
empty sequence, e.g.
<element name="foo">
<complexType>
<choice>
<element name="bar" ... />
<sequence/>
</choice>
</complexType>
</element>
In this case, the ChoiceCombinator gram has two gram alternatives, where
the second is an EmptyGram. In runtime1, we detect this EmptyGram when
building the parsers and convert it to a ChoiceBranchEmptyParser, which
is essentially just a no-op. But the C codegen would likely just see the
EmptyGram in this case.
If codegen-c supports choices with empty branches, the above might be a
simple test to reproduce the issue. Unfortunately, there are likely
other places where EmptyGrams cannot be optimized out, and I don't think
there is currently a good way to know where those places are, like the
ChoiceCombinator. I can't think of any off hand.
If you want to figure out the issue with this particular schema to make
sure it's handled correctly (no-op is probably right, but can't say for
sure), you might need to put some debug breakpoints or print statements
in your C gram walker, and see what it walks into shortly before it hits
EmptyGram. That might give you an idea of what in the schema is causing
an EmptyGram. I'm happy to help debug if you can share any of the schema
(I understand if that's not he case).
On 2023-11-20 03:25 PM, Interrante, John A (GE Aerospace, US) wrote:
I just fixed the Daffodil C code generator to stop crashing on someone's
private DFDL schema. Their schema is private and far too big to show here, but
somehow their schema was creating an EmptyGram object. I've now made sure the
C code generator knows how to skip an EmptyGram object instead of crashing when
it sees the EmptyGram object.
I would like to add a TDML test case to my PR along with the fix, but how do
you write a DFDL schema which creates an EmptyGram object anyway? I haven't
been able to figure out how the user's own schema creates the EmptyGram object
because the schema has too many things in it already.
John