hi
My personal opinion is that the HTML Link Parser is not directly useful in
'spidering' a site. (Multiple reasons here)

if all you want is a list of pages , there are plenty of open source/free
spiders .
If your pages are generated by a database , there probably exists a
query(typically product catalogs like pages) that can tell you what those
pages should be and you can use JMeter to verify that the pages are indeed
present (and perhaps correct) , JDBC sampler , saved to a CSV followed by
something that reads it.

If you really really still want to spider then the only approach i can see
is
Take a look at
org.apache.jmeter.protocol.http.modifier.AnchorModified (its the source for
HTML Link Parser, and has the code that fetches the url). For a spider you'd
probably want to fetch all the urls (uniquely) . Then I'd probably write to
a file(or in memory is small enough) which is picked up by the next stage
(which does the same thing but maintains which links have been fetched and
which need to be fetched till a maximum depth of N levels / no more links).
Disclaimer : I've never tried it and will probably take a bit of doing.

If all you want to do is avoid the last request problem , then I'd probably
create a BSH Post processor that does what the link parser does (i.e. get
the link) so that the link can be checked before making the request.

regards
deepak


On Tue, Oct 27, 2009 at 3:16 AM, Jason James <mrblit...@gmail.com> wrote:

> Hi,
>
> I have now spent several days on this (yes, have read the manuals!!!)
> and I cannot get jmeter to spider my website.
>
> ThreadGroup
> + Http request (host = mysite.org)
> + While Loop [the manual informs users to use a Simple Controller -
> this does not loop]
> ++ Http Request (host = mysite.org ; path=.*)
> ++ Link Parser
>
> The 'spider' starts down the list and after a random number of
> successful http requests it gets to http://mysite.org/* and dies.
>
> How do I get it to avoid this kind of death? (using jmeter 2.3.4)
>
> Other people have had this kind of problem (since 2006 it seems) and I
> have trawled the mailing list but no one has posted a solution that
> works.Richard Martin
> [http://www.mail-archive.com/jmeter-user@jakarta.apache.org/msg16275.html]
> says he can live with this kind of death but I need a list of all
> (database generated) html pages on this website so I cannot live with
> this kind of death.
>
> Thanks,
> Jason
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-user-unsubscr...@jakarta.apache.org
> For additional commands, e-mail: jmeter-user-h...@jakarta.apache.org
>
>

Reply via email to