On Mon, Jan 17, 2011 at 9:53 AM, porneL <[email protected]> wrote:
> On Mon, 17 Jan 2011 01:45:43 -0000, Miller Medeiros < > [email protected]> wrote: > > I still believe that this analogy fits well.. XML is stricter than HTML >> and have simpler rules (all tags open and close on a sane order) and because >> of that is easier to parse.. >> > > A little off-topic: I've been implementing my own HTML and XML parsers, and > I don't agree that XML is easier to parse. > > The seemingly magic rules for optional tags in HTML are actually very > simple to implement, and you can hardcode them instead of using real DTD. > > Handling of empty elements is a matter of looking up tagname in a fixed > list vs two extra states in an XML parser — it's not very different in > complexity. Optionally closed tags are piece of cake to implement too > (basically you implement part of XML error handling, except the line that > stops the parser). > > XML has huge additional complexity. Before you even start, you need to > write an SGML DTD parser and fetch half dozen files in order to be able to > parse a typical XHTML file. The syntax is additionally complicated by > allowing infinitely nested entities containing markup and namespace > indirection. Even XML's strict error handling is not helpful, because these > are extra code paths and strict behaviors you have to add to the parser. > > I totally disagree.. I was just talking about returning the content of a node and it's attributes, nothing about DTD, schema, error handling.. the serialization process is easier... you can go char by char (or using a RegExp) matching for opening tags and wait until you find a closing tag.. everything in between is the content of that node, nothing hard coded... on HTML certain tags will auto-close the parent node so you need to know them beforehand and hard code these values (into a hash table, array or something like that).. //simple example of retrieving node content var xmlString = '<xml>dolor sit amet <tag>lorem ipsum</tag> <anotherTag>maecennas</anotherTag></xml>'; function getNodeContent(nodeName, xmlString){ var regexp = new RegExp('<'+ nodeName +'>(.+)<\\/'+ nodeName +'>'); return regexp.exec(xmlString)[1]; } console.log( getNodeContent('tag', xmlString) ); //will output "lorem ipsum" It was just to explain that stricter rules can reduce complexity in some cases since you can "ignore" edge-cases. I thought that the XML parsing being easier than HTML was a common sense... - I'm not going to keep discussing about XML/HTML complexity on a JS list. PS: one of the reasons why JSON is so strict is to avoid ambiguity and make it easier to parse... cheers. -- To view archived discussions from the original JSMentors Mailman list: http://www.mail-archive.com/[email protected]/ To search via a non-Google archive, visit here: http://www.mail-archive.com/[email protected]/ To unsubscribe from this group, send email to [email protected]
