EmptyGram is created when a grammar is optimized out. Looking through our code, here's a fairly small example:

  private lazy val complexContentSpecifiedLength =
    prod("complexContentSpecifiedLength", isComplexType) {
      initiatorRegion ~ sharedComplexContentRegion
    }

So we have an outer "prod" gram which has a guard of "isComplexType" defined. If isComplexType evaluates to false, then that prod just becomes an EmptyGram, and we essentially ignore the sub-grams that make up this prod. But if it is a complex type, then we evaluate those sub-grams which likely also have guards on them (e.g. initiatorRegion probably has a guard for hasInitiator), which could also end up as EmptyGram or could evaluate to more sub-grams.

In most of these cases, these EmptyGram's should completely disappear. For example, a grammar of "fooGram ~ EmptyGram" just becomes "fooGram", and the C code generator should never see the EmptyGram.

However, there are some cases where it might not completely disappear.

One such example is when we have a choice branch where one branch is the empty sequence, e.g.

  <element name="foo">
    <complexType>
      <choice>
        <element name="bar" ... />
        <sequence/>
      </choice>
    </complexType>
  </element>

In this case, the ChoiceCombinator gram has two gram alternatives, where the second is an EmptyGram. In runtime1, we detect this EmptyGram when building the parsers and convert it to a ChoiceBranchEmptyParser, which is essentially just a no-op. But the C codegen would likely just see the EmptyGram in this case.

If codegen-c supports choices with empty branches, the above might be a simple test to reproduce the issue. Unfortunately, there are likely other places where EmptyGrams cannot be optimized out, and I don't think there is currently a good way to know where those places are, like the ChoiceCombinator. I can't think of any off hand.

If you want to figure out the issue with this particular schema to make sure it's handled correctly (no-op is probably right, but can't say for sure), you might need to put some debug breakpoints or print statements in your C gram walker, and see what it walks into shortly before it hits EmptyGram. That might give you an idea of what in the schema is causing an EmptyGram. I'm happy to help debug if you can share any of the schema (I understand if that's not he case).



On 2023-11-20 03:25 PM, Interrante, John A (GE Aerospace, US) wrote:
I just fixed the Daffodil C code generator to stop crashing on someone's 
private DFDL schema.  Their schema is private and far too big to show here, but 
somehow their schema was creating an EmptyGram object.  I've now made sure the 
C code generator knows how to skip an EmptyGram object instead of crashing when 
it sees the EmptyGram object.

I would like to add a TDML test case to my PR along with the fix, but how do 
you write a DFDL schema which creates an EmptyGram object anyway?  I haven't 
been able to figure out how the user's own schema creates the EmptyGram object 
because the schema has too many things in it already.

John


Reply via email to