Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

Browning, Don Mon, 17 Feb 2003 12:51:17 -0800

I have to agree.  Between the sheer speed of xpath queries and the XPathNavigator 
class, I would stick to xpath.  I've queried some big documents using the 
XPathNavigator and the results come back instantaneously.


Not to mention, in a couple of years it will be easier for future programmers working 
with the code to understand what you were doing with the xpath vs. regex.

My 0.02...

Don


-----Original Message-----
From: Eric Gunnerson [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 17, 2003 1:53 PM
To: [EMAIL PROTECTED]
Subject: Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?


My vote would be "Yes".

My experience with using Regex on XML is that it's a pain to get things parsed apart. 
I think it's much much easier to use XmlDocument and then traverse the nodes to get 
what you want.

As for performance, I'm not sure. Regex may be faster, but the XML parser is pretty 
quick.

-----Original Message-----
From: Larry O'Brien [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 17, 2003 9:37 AM
To: [EMAIL PROTECTED]
Subject: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?


I am working with well-formed but non-validated (no DTD, no schema) computer-generated 
XML. My task is to make consistent objects from a large number of these 
mainframe-generated strings; for instance, one call might generate the XML 
"<RESPONSETYPE1><firstName>Aaron</firstName><firstName>Bob</firstName><l
astName>Amundsen</lastName><lastName>Bickerson</lastName></RESPONSETYPE1
>" and another call might generate
"<RESPONSETYPE2><First>Aaron</First><Last>Amundsen</Last><First>Bob></Fi
rst><Last>Bickerson</Last></RESPONSETYPE2>". My task is to turn these
into .NET objects (say, Customer objects with FirstName and LastName properties).

Beneath the root element, the data is absolutely flat: repeating data types are never 
aggregated into sub-elements. For instance, in <RESPONSETYPE1> above, I have to match 
the first <firstName> element with the first <lastName> element, etc. My question is: 
am I foolish for using regular expressions to work with the XML rather than an XML 
parser?

My "spike" solution is to use the following method. Can this be done more efficiently 
(in either lines of code or significant runtime
performance) with an XML parser? Please note that the mainframe data is absolutely 
consistent in generating whitespace, character set, and so forth. XML's advantages in 
these areas are not relevant.

public static string[] Elements(string elementName, string xml) {
   ArrayList responses = new ArrayList();
   Regex re = new Regex(String.Format("<{0}>(?<value>.*?)</{0}>",
elementName));
   MatchCollection matches = re.Matches(xml);
   foreach(Match m in matches)
   {
      string value = m.Groups["value"].Value;
      responses.Add(value);
   }
   return (string[]) responses.ToArray(typeof(string));
}

Thanks in advance,
Larry

Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

Reply via email to