-----------------------------------------------------------
New Message on BDOTNET
-----------------------------------------------------------
From: LovedJohnySmith
Message 3 in Discussion
Hey Folks!
Please don't RegEx for your search within search results... Many people use
the RegEx class to find data inside of HTML. We're not a fan of this approach,
though. There are so many edge cases to contend with in HTML that the regular
expressions grow hideously complex, and the regular expression language is
notorious for being a "write-only" language.
My usual approach is to transform the web page into an object model. This
sounds complicated, but not if someone else does all the heavy lifting. Two
pieces of software that can help are the SgmlReader on GotDotNet, and Simon
Mourier's Html Agility Pack. The agility pack is still a .NET 1.1 project, but
I have it running under 2.0 with only minor changes (I just needed to remove
some conditional debugging attributes). With these libraries, it is easy to
walk through the page like an XML document, perform XSL transformations, or
find data using XPath expressions (which to me are a lot more readable than
regular BLOCKED EXPRESSION.
Here is a little snippet of code that uses the agility pack and will dump the
text inside all of the links ( anchor tags) on OTC's front page.
WebRequest request = WebRequest.Create("http://www.OdeToCode.com");
using (WebResponse response = request.GetResponse())
{
HtmlDocument document = new HtmlDocument();
document.Load(response.GetResponseStream());
foreach (HtmlNode node in document.DocumentNode.SelectNodes("//a"))
{
Console.WriteLine(node.InnerText);
}
}
Thanks,
Smith
http://spaces.msn.com/johnysmith
-----------------------------------------------------------
To stop getting this e-mail, or change how often it arrives, go to your E-mail
Settings.
http://groups.msn.com/bdotnet/_emailsettings.msnw
Need help? If you've forgotten your password, please go to Passport Member
Services.
http://groups.msn.com/_passportredir.msnw?ppmprop=help
For other questions or feedback, go to our Contact Us page.
http://groups.msn.com/contact
If you do not want to receive future e-mail from this MSN group, or if you
received this message by mistake, please click the "Remove" link below. On the
pre-addressed e-mail message that opens, simply click "Send". Your e-mail
address will be deleted from this group's mailing list.
mailto:[EMAIL PROTECTED]