On Monday, February 12, 2018 13:51:56 H. S. Teoh via Digitalmars-d-announce wrote: > For example, entity > support pretty much means plain slices are no longer an option, because > you have to perform substitution of entity definitions, so you'll have > to either wrap it in some kind of lazy range that chains the entity > definition to the surrounding text, or you'l have to use strings or > something else. Which means you'll need to have memory allocation / > slower parsing / whatever, but that's the price of DTD support.
Which was my point. The API as-is doesn't work with DTD support for those very reasons. > But again, the point is, basic XML parsing (without DTD support) doesn't > *need* to pay this price. What's currently in dxml doesn't need to > change. DTD support can be implemented in a submodule / separate module > that wraps around dxml and builds DTD support on top of it. > > Put another way, we can implement DTD support *on top of* dxml this way: > - Parse the XML using dxml as an initial step (this can be done lazily, > or semi-lazily, as needed). > - As an intermediate step, parse the DTD section, construct whatever > internal state is needed to handle DTD rules, a dictionary of entity > references, etc.. > - Filter the output of dxml to insert whatever extra behaviour is needed > to implement DTD support before handing it to the calling code, e.g., > expand entity references, or implement validation and throw an > exception if validation fails, etc.. > > *We don't need to change dxml's current API at all.* I don't think that this works, because the entity references insert new XML and thus affect the parsing. And as such, you can't simply pass through the entity references to be processed by another parser. They need to be handled by the core parser, otherwise it's going to give incorrect results, not just results that need further parsing. I'm sure that dxml's internals could be refactored so that they could be shared with another parser that did that, but unless I'm misunderstanding how entity references work, you can't use what's there now as-is and build another parser on top of it. The entity reference replacement needs to happen in the core parser. > The DTD wrapper doesn't guarantee (and doesn't need to!) to return > slices of the input like dxml does. I don't see that as a problem, since > I can't see how anyone would be able to implement full DTD support with > only slices, even independently from the way dxml is implemented right > now. Yeah, if I were writing a parser that handled the DTD section, I wouldn't make it deal with slices of the input like DTD does unless I decided to make it always return string, in which case, you could get slices of the original input for strings but no other range types - it's either that or using a lazy range, which would be worse if you passed strings but better for other range types. And that's the main reason that I gave up on having dxml handle the DTD section. I consider that approach unacceptable. One of the key goals for dxml was that it would be providing slices of the input and not lazy ranges or allocating new strings. In any case, unless I misunderstand how entity references work, that would have to be its own parser and not simply a wrapper around dxml because of how the entity references affect the parsing. If I'm wrong, then great, someone else can come along later and add some sort of DTD parser on top of dxml, and if I'm right, well, then anyone who wants to do anything like that is going to need to write a new parser, but that can then coexist alongside dxml's parser just fine. Either way, I like dxml's approach and don't want to compromise what it's doing in an attempt to fully deal with DTDs. - Jonathan M Davis
