Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

casey chesnut Mon, 17 Feb 2003 13:52:32 -0800

here is a possibly legit scenario ...

CF .NET does not support XPath,
and DataSets are slow on small devices.
One possibility might be to do a regex 1st,
for the lack of XPath,
and then do an XmlReader on the result set.

_____________________________
casey
http://www.brains-N-brawn.com

-----Original Message-----
From: Moderated discussion of advanced .NET topics.
[mailto:[EMAIL PROTECTED]] On Behalf Of Browning, Don
Sent: Monday, February 17, 2003 2:22 PM
To: [EMAIL PROTECTED]
Subject: Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

I have to agree.  Between the sheer speed of xpath queries and the
XPathNavigator class, I would stick to xpath.  I've queried some big
documents using the XPathNavigator and the results come back
instantaneously.

Not to mention, in a couple of years it will be easier for future
programmers working with the code to understand what you were doing with
the xpath vs. regex.

My 0.02...

Don

-----Original Message-----
From: Eric Gunnerson [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 17, 2003 1:53 PM
To: [EMAIL PROTECTED]
Subject: Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

My vote would be "Yes".

My experience with using Regex on XML is that it's a pain to get things
parsed apart. I think it's much much easier to use XmlDocument and then
traverse the nodes to get what you want.

As for performance, I'm not sure. Regex may be faster, but the XML
parser is pretty quick.

-----Original Message-----
From: Larry O'Brien [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 17, 2003 9:37 AM
To: [EMAIL PROTECTED]
Subject: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

I am working with well-formed but non-validated (no DTD, no schema)
computer-generated XML. My task is to make consistent objects from a
large number of these mainframe-generated strings; for instance, one
call might generate the XML
"<RESPONSETYPE1><firstName>Aaron</firstName><firstName>Bob</firstName><l
astName>Amundsen</lastName><lastName>Bickerson</lastName></RESPONSETYPE1
>" and another call might generate
"<RESPONSETYPE2><First>Aaron</First><Last>Amundsen</Last><First>Bob></Fi
rst><Last>Bickerson</Last></RESPONSETYPE2>". My task is to turn these
into .NET objects (say, Customer objects with FirstName and LastName
properties).

Beneath the root element, the data is absolutely flat: repeating data
types are never aggregated into sub-elements. For instance, in
<RESPONSETYPE1> above, I have to match the first <firstName> element
with the first <lastName> element, etc. My question is: am I foolish for
using regular expressions to work with the XML rather than an XML
parser?

My "spike" solution is to use the following method. Can this be done
more efficiently (in either lines of code or significant runtime
performance) with an XML parser? Please note that the mainframe data is
absolutely consistent in generating whitespace, character set, and so
forth. XML's advantages in these areas are not relevant.

public static string[] Elements(string elementName, string xml) {
   ArrayList responses = new ArrayList();
   Regex re = new Regex(String.Format("<{0}>(?<value>.*?)</{0}>",
elementName));
   MatchCollection matches = re.Matches(xml);
   foreach(Match m in matches)
   {
      string value = m.Groups["value"].Value;
      responses.Add(value);
   }
   return (string[]) responses.ToArray(typeof(string));
}

Thanks in advance,
Larry

Re: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?

Reply via email to