Re: Downloading HTML frameset pages via HTTPClient

Ken Krugler Mon, 24 Aug 2009 13:01:04 -0700

Hi Melroy,

On Aug 24, 2009, at 12:20pm, melroyr wrote:

I have written a program to download html pages from harristeeter.However,
when I run my program, I get the following

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd";>
<html>
<head>
<title>Your Personal Shopping List</title>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">


[snip]

</frameset>
<frame src="actions.jsp" name="bottomFrame" scrolling="YES" noresize>
</frameset>

<noframes><body>
This application requires the use of frames, which your browser doesnot
support.
</body></noframes>

</html>

The URL I am using to download the pages is
http://flyer.harristeeter.com/HT_eVIC/ThisWeek/ReviewAllSpecials.jsp
Please advise if there is some setting that I need do set inHttpClient? Ihave read about HtmlCleaner and stuff but I do not think they willhelp.

Well, first it would help to know what you think is the problem. Theabove page seems OK to me.

If I had to guess, the issue is that you want the content of the frame(e.g. the <frame src="xxx"> link)

If so, then HttpClient can't automagically help you here. Easiestapproach would be to use a regex to extract the src="xxx" links,convert them from relative to absolute, and fetch again...similar towhat a real web crawler might do.


-- Ken


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Downloading HTML frameset pages via HTTPClient

Reply via email to