Hi Larry,

Like most other who responded before me, I'd go the XML route.  With one
twist, however: if you are dealing with large data sets, this will be a
memory hog.

If we only had a decent implementation of SAX, things would be so much more
different.  XML structures like this tend to perform lightning-fast with
streaming parsers and not so well with DOM implementations...

K. Lilov
Str Library at http://www.utilitycode.com/str

-----Original Message-----
From: Moderated discussion of advanced .NET topics.
[mailto:[EMAIL PROTECTED]] On Behalf Of Larry O'Brien
Sent: Monday, February 17, 2003 7:37 PM
To: [EMAIL PROTECTED]
Subject: [ADVANCED-DOTNET] Regex for parsing XML - Foolish?


I am working with well-formed but non-validated (no DTD, no schema)
computer-generated XML. My task is to make consistent objects from a large
number of these mainframe-generated strings; for instance, one call might
generate the XML
"<RESPONSETYPE1><firstName>Aaron</firstName><firstName>Bob</firstName><l
astName>Amundsen</lastName><lastName>Bickerson</lastName></RESPONSETYPE1
>" and another call might generate
"<RESPONSETYPE2><First>Aaron</First><Last>Amundsen</Last><First>Bob></Fi
rst><Last>Bickerson</Last></RESPONSETYPE2>". My task is to turn these
into .NET objects (say, Customer objects with FirstName and LastName
properties).

Beneath the root element, the data is absolutely flat: repeating data types
are never aggregated into sub-elements. For instance, in <RESPONSETYPE1>
above, I have to match the first <firstName> element with the first
<lastName> element, etc. My question is: am I foolish for using regular
expressions to work with the XML rather than an XML parser?

My "spike" solution is to use the following method. Can this be done more
efficiently (in either lines of code or significant runtime
performance) with an XML parser? Please note that the mainframe data is
absolutely consistent in generating whitespace, character set, and so forth.
XML's advantages in these areas are not relevant.

public static string[] Elements(string elementName, string xml) {
   ArrayList responses = new ArrayList();
   Regex re = new Regex(String.Format("<{0}>(?<value>.*?)</{0}>",
elementName));
   MatchCollection matches = re.Matches(xml);
   foreach(Match m in matches)
   {
      string value = m.Groups["value"].Value;
      responses.Add(value);
   }
   return (string[]) responses.ToArray(typeof(string));
}

Thanks in advance,
Larry

Reply via email to