On Mon, Feb 12, 2018 at 09:50:16AM -0700, Jonathan M Davis via Digitalmars-d-announce wrote: [...] > The core problem is that entity references get replaced with more XML > that needs to be parsed. So, they can't simply be passed on for > post-processing. As I understand it, they have to be replaced while > the parsing is going on. And that means that you can't do something > like return slices of the original input that don't bother with the > entity references and then have a separate parser take that and > process it further to deal with the entity references. The first > parser has to deal with them, and that means not returning slices of > the original input unless you're dealing purely with strings and are > willing to allocate new strings in the cases where the data needs to > be mutated because of an entity reference. [...]
I think you missed my point. What I'm trying to say is, given the current functionality of dxml, one *can* build an XML interface that implements DTD support. Of course, some concessions obviously have to be made, such as needing to allocate memory (I don't see how else one could keep a dictionary of DTD rules / entity declarations otherwise, for example), or not being able to return only slices of the input anymore. For example, entity support pretty much means plain slices are no longer an option, because you have to perform substitution of entity definitions, so you'll have to either wrap it in some kind of lazy range that chains the entity definition to the surrounding text, or you'l have to use strings or something else. Which means you'll need to have memory allocation / slower parsing / whatever, but that's the price of DTD support. But again, the point is, basic XML parsing (without DTD support) doesn't *need* to pay this price. What's currently in dxml doesn't need to change. DTD support can be implemented in a submodule / separate module that wraps around dxml and builds DTD support on top of it. Put another way, we can implement DTD support *on top of* dxml this way: - Parse the XML using dxml as an initial step (this can be done lazily, or semi-lazily, as needed). - As an intermediate step, parse the DTD section, construct whatever internal state is needed to handle DTD rules, a dictionary of entity references, etc.. - Filter the output of dxml to insert whatever extra behaviour is needed to implement DTD support before handing it to the calling code, e.g., expand entity references, or implement validation and throw an exception if validation fails, etc.. *We don't need to change dxml's current API at all.* At the most, I anticipate that the only potential change needed is to expose an interface to parse XML fragments (i.e., not a complete XML document that contains an outer <xml> tag, but just some PCDATA that may contain entities or tags) so that the DTD support wrapper can use it to expand entities and insert any tags that may appear inside the entity definition. The DTD wrapper doesn't guarantee (and doesn't need to!) to return slices of the input like dxml does. I don't see that as a problem, since I can't see how anyone would be able to implement full DTD support with only slices, even independently from the way dxml is implemented right now. We can even design the DTD support wrapper to start with being just a thin wrapper around dxml, and lazily switch to full DTD mode only if a DTD section is encountered. Then user code that doesn't care to use dxml's raw API won't even need to care about the difference. T -- Curiosity kills the cat. Moral: don't be the cat.
