newsclipperdevlist  

Re: On the robustness of handlers

David Coppit
Tue, 01 Aug 2000 06:53:54 -0700

On Mon, 31 Jul 2000, Chris Pimlott wrote:

>       No matter what, there has to be some reference for where to start
> grabbing.  But what's the most stable thing to use as an anchor?  An
> image?  A comment?  An ad banner?  The second <h1> tag from the top?  I
> definately do try to put some thought into this when I make handlers but I
> don't know what's the best way;  not that there's necessarily a single
> answer, but I thought it might a good thing to discuss amongst ourselves.

As I mentioned before, it's best to use "^" and "$" for the start and end
patterns, then filter out what you want. This works great for links and
images. But if you're grabbing straight HTML, then you're out of luck.

I try to avoid HTML tags, figuring they change a lot, but sometimes you can't.
Ideally, we would have some pattern that knows about HTML tags -- "start at
the third <h1> on the page that follows a <center>"

>       Also, about the yahooweather handler.  I wrote some code that will
> grab, starting at the intial <table> opening tag, the entire table to the
> </table>.  This includes dealing with nested tables; it keeps track of new
> tables being opened and can tell when the initial table ends.  I'm pasting
> the code for two reasons.  One, it may be useful to others.  Two, to make
> sure I haven't missed anything... :)

Cool. Mind if I add a GetTable API function based on this?

David

____________________________________________________________________________
David Coppit <[EMAIL PROTECTED]>        President, Spinnaker Software
http://www.newsclipper.com/ -- Snip and ship dynamic content to your website


-
If you would like to unsubscribe from this mailing list send an email to 
[EMAIL PROTECTED] with the body "unsubscribe newsclipperdevlist 
YOUR_EMAIL_ADDRESS" (without the quotes) or use the form provided at 
http://www.NewsClipper.com/TechSup.htm#MailingList.