On 31/3/06 3:08 PM, "Antone Roundy" <[EMAIL PROTECTED]> wrote:
>> The escaped HTML content contained within the content element that >> David was originally concerned with is more than likely a copy of >> all or part of the elements and content contained inside the body >> tag of the external document referenced by an associated link >> element, and therefore no guarentee that the xml:base of the atom >> feed is going to be anywhere even close to accurate. I'm doing something similar right now, scraping some website that doesn't provide feeds for what I want. I check the html of the page I scraped and if they have a <base> I use that, else I use the URL I used to fetch the page. The tag soup I extract for each entry contains relative references. I really don't want to go fixing that tag soup so I just stick that base url into xml:base for each entry (and not just at the top of the feed, because I'm scraping paginated results). e.