That's really interesting.  Currently, JMeter supports everything you've
outlined, except #6, and also not that, with respect to #1 and #7, setting
up spidering might take a fair amount of trial and error.  You may have an
easier time creating a script specific to your site, in order to record
exactly what you want recorded.

The one thing you didn't mention, is that JMeter doesn't actually record the
pages and images retrieved, so, you would need to write a new Listener to do
just that - this would be a relatively trivial matter.

-Mike

> -----Original Message-----
> From: Bruce Atherton [mailto:[EMAIL PROTECTED]]
> Sent: Monday, December 10, 2001 12:53 PM
> To: [EMAIL PROTECTED]
> Subject: Using JMeter for Archiving a Website?
> 
> 
> I am trying to archive the contents of an extranet website 
> which is mostly 
> dynamic content. I'd like to record what it's contents every 
> day, and store 
> them in a format where you can open up the website and browse 
> it as it 
> existed on any given day.
> 
> I posted a message on Usenet and one respondent suggested I 
> look at JMeter 
> for a solution. I was wondering whether anyone on this list 
> had set up 
> JMeter to do something similar, or had other suggestions as 
> to how I could 
> accomplish my task, involving JMeter or not. I'm willing to 
> code some Java 
> if that would help.
> 
> Some of the features I require for this website snapshot program:
> 
> 1. Parse the HTML and extract further URLs to follow, just 
> like any spider 
> does.
> 
> 2. Provide support for URL Encoding of a Session ID
> 
> 3. Parse forms to recognize Submit URLs and the field data 
> that must be 
> returned in a POST, including hidden fields.
> 
> 4. Allow setting a configuration file to provide the data 
> that should be 
> returned for a particular field in a form (for example, 
> setting what should 
> be returned in "username" and "password" fields).
> 
> 5. Support regular expressions so that you can make sure the 
> session is 
> going the way it should. For example, if you get "Login 
> Failed" in the 
> returned HTML you should be able to recognize that as an 
> error condition.
> 
> 6. Replace any absolute URLs with relative ones, so that if 
> you open the 
> archive on disk it will look and act exactly the same way the 
> web site did 
> that day.
> 
> 7. Do depth first searches (which a user could conceivably 
> do) rather than 
> breadth first (which a user could not do) so that context within the 
> session is kept sensible.
> 
> Any pointers, suggestions, guidelines? I'd be most 
> appreciative of any 
> information. Thanks.
> 
> 
> --
> To unsubscribe, e-mail:   
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to