On Tuesday, February 13, 2018 21:18:12 Patrick Schluter via Digitalmars-d- announce wrote: > There's also the issue that entity references open a whole can of > worms concerning security. It quite possible to have an > exponential growing entity replacement that can take down any > parser.
Well, if dxml just passes the entity references along unparsed beyond validating that the entity reference itself contains valid characters (e.g. it's not something like &.; or & by itself), then dxml would still not be replacing the entity references with anything. Any security or performance problems associated with entity references would be left up to whatever parser parsed the DTD section and then used dxml to parse the rest of the XML and replaced the entity references in dxml's parsing results with whatever they were. The big problem is how the entity references affect the parsing. If start tags can be dropped in and affect the parsing (and it's still not clear to me from the spec whether that's legal - there is a section talking about being nested properly which might indicate that that's not legal, but it's not very specific or clear), and if it's legal to do something like use an entity reference for a tag name - e.g. <&foo;>, then that's a serious problem. And problems like that are the main reason why I completely dropped any attempt to do anything with the DTD section. If entity references are only legal in the text between start and end tags and between the quotes of attribute values, and whatever they're replaced with cannot actually affect anything else in the XML document (i.e. it can't just be a start or end tag or anything like that - it has to be fulling parseable on its own and not affect the parsing of the document itself), then passing them along should be fine. Basically, if I can change dxml so that in the places where it currently allows one of the standard entity references to be, it then also allows other entity references but passes them along without replacing them instead of throwing an XMLParsingException, and that works without having documents be screwed up due to missing start tags or something, then passing them along should be fine. But if entity references allow arbitrary enough chunks of XML, that doesn't work. It also doesn't work if entity references are allowed in places other than the text between start and end tags or within attribute values. And it's not clear to me at all what is legal in an entity reference or where exactly they're legal. The spec talks about the grammar being the grammar _after_ all of the references have been replaced, which makes the grammar rather untrustworthy, and I find the spec very hard to understand in general. Regardless, there's no risk of dxml's parser ever being changed to actually replace entity references. That doesn't work with returning slices of the original input, and it really doesn't work with a parser that's just supposed to take a range of characters and parse it. To fully handle all of the DTD stuff means actually reading files from disk or from the internet - which of course is where the security problems come in, but it also means that you're not just dealing with a parser anymore. In principle, dxml's parser should be pure (though some implementation make it so that it isn't right now), whereas an XML parser that fully handles the DTD section could never be pure. - Jonathan M Davis