I still can't help with the unreliable gateways, but a couple of
things to note.  First, the last modified header will not help you
retrieve a  partial feed.  It will still retrieve the entire feed, but
only if the feed has changed (better than nothing).

You're right on your second point.  The feeds that are .cfm URLs
generate the feed on the fly (I assume) rather than generate a static
file behind the scenes.  That's why I use both the Etag and
Last-Modified headers.  Only static files appear to carry the ETag
header, from what I've been able to tell.

On 4/19/06, Neil Middleton <[EMAIL PROTECTED]> wrote:
> Thanks guys for this...it's been plugging some of the gaps I hadn't really 
> considered.  The site is very much in development still, even if it doesn't 
> appear to be changing on the surface.
>
> I will definitely consider using the last modified headers.  I didn't realise 
> you could retreive them without pulling down the whole file.  That in itself 
> would alleviate things massively.
>
> One thing I have noticed with the CF community is that the RSS feeds that are 
> published seem to be all over the place, some doing things one way, some 
> doing it another.  This doesn't help when trying to write a spider ;-)
>
> Does anyone have any possible insight into why the async gateways might by 
> unreliable?
>
> >Neil -
> >
> >To prevent over-polling and, as Roger pointed out, potentially getting
> >your IP blocked, consider Etag/If-None-Match headers as well as the
> >Last-Modified/If-Modified-Since headers:
> >
> >1.  When you retrieve a feed, store the ETag and Last-Modified response 
> >headers
> >2.  When you next poll the feed, only retrieve those feeds that have
> >been updated
> >
> ><cfhttp        url="#variables.feedURL#"
> >       method="GET"
> >       useragent="feedsquirrel.com (or whatever)"
> >       throwonerror="yes"
> >>
> >       <cfhttpparam    type="header"
> >                               name="If-None-Match"
> >                               value="#variables.storedEtagValue#"
> >       />
> >       <cfhttpparam    type="header"
> >                               name="If-Modified-Since"
> >                               value="#variables.storedLastModifiedValue#"
> >       />
> ></cfhttp>
> >
> >A nice way to reduce bandwidth consumption and be respectful of the
> >host server/feed author.  A couple of additional suggestions:
> >
> >1.  Provide a user agent that allows a host server to know where the
> >request is coming from and, if the feel it necessary, block that
> >request.
> >2.  Respect the feed authors TTL value (in the case of an RSS 2.0
> >feed).  Don't update the feed any more often than requested in this
> >value (if there is one).
> >3.  Again, in the case of RSS 2.0 feeds, respect any skipDays and
> >skipHours values.  Don't poll on Sundays if the author has told you
> >that the feed won't be updated on Sundays.
> >
> >I know there is a TTL equivalent in Atom 1.0/RSS 1.0, but honestly
> >can't remember what it is.  If you look at the specs, it should jump
> >out.  It's been a while since I wrote the feed aggregator that is
> >embedded in the product I build.  I don't recall there being a decent
> >equivalent for RSS 1.0 or Atom 1.0 for skipDays and skipHours.
> >
> >On 4/19/06, Roger Benningfield <[EMAIL PROTECTED]> wrote:
> >>
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:238134
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to