On Tuesday, February 13, 2018 15:22:32 Kagamin via Digitalmars-d-announce 
wrote:
> On Monday, 12 February 2018 at 16:50:16 UTC, Jonathan M Davis
>
> wrote:
> > The core problem is that entity references get replaced with
> > more XML that needs to be parsed. So, they can't simply be
> > passed on for post-processing. As I understand it, they have to
> > be replaced while the parsing is going on. And that means that
> > you can't do something like return slices of the original input
> > that don't bother with the entity references and then have a
> > separate parser take that and process it further to deal with
> > the entity references. The first parser has to deal with them,
> > and that means not returning slices of the original input
> > unless you're dealing purely with strings and are willing to
> > allocate new strings in the cases where the data needs to be
> > mutated because of an entity reference.
>
> Standard entities like & have the same problem, so the same
> solution should work too.

That depends on what exactly an entity reference can contain. If it can do
something like put a start tag in there, and then it has to be terminated by
the document putting an end tag in there or another entity reference
containing an end tag, then it can't be handled after the fact like &
can be, since & is just replaced by text. If an entity reference can't
contain a start tag without a matching end tag, then sure. But I find the
XML spec to be surprisingly hard to understand with regards to entity
references. It's not clear to me where it's even legal to put them or not,
let alone what you're allowed to put in them exactly. And I can't even
really trust the XML gramamr as long as entity references are involved,
because the gramamr in the spec is the grammar _after_ entity references
have all been replaced, which I was quite dismayed to figure out.

If it's 100% sure that entity references can be treated as just text and
that you can't end up with stuff like start tags or end tags being inserted
and messing with the parsing such that they all have to be replaced for the
XML to be correctly parsed, then I have no problem passing entity references
along, and a higher level parser could try to do something with them, but
it's not clear to me at all that an XML document with entity references is
correct enough to be parsed while not replacing the entity references with
whatever XML markup they contain. I had originally passed them along with
the idea that a higher level parser could do something with them, but I
decided that I couldn't do that if you could do something like drop a start
tag in there and change the meaning of the stuff that needs to be parsed
that isn't directly in the entity reference.

- Jonathan M Davis

Reply via email to