Ah I see what you mean - yes, that's quite a useful check :-) Of course, this only checks pages accessible through explicit <a> links, not with any content provided through forms, etc. Thanks for sharing your code.
How long does it take to run across your site? Cheers Jevon On Wed, Apr 21, 2010 at 9:00 PM, Xhenseval, Benoit < benoit.xhense...@credit-suisse.com> wrote: > Hi Jevon, > > Yes I was hoping to simply check every links of an application whislt > running integration tests with Maven. The aim is to quickly check that > there are no exceptions thrown. > > I have come up with the following code, which may be useful to others: > > private boolean useDiffPagesOnly = true; // only visit a page once, i.e. > not for different parameters > > public void testSpider() { > final Set<String> identifiedLinks = Sets.newHashSet(); > final Stack<String> toVisit = new Stack<String>(); > final String base = "http://localhost:" + getPort(); > > final String startPage = "/index.html"; > identifiedLinks.add(startPage); > toVisit.push(startPage); > > int count = 0; > // do not check links that CONTAIN the following strings > final HashSet<String> forbidden = Sets.newHashSet("delete", > "inventorySummaries.html" > while (!toVisit.isEmpty()) { > count++; > gotoPage(toVisit.pop()); > grabLinksInPage(identifiedLinks, toVisit, base, forbidden); > } > } > > private void grabLinksInPage(final Set<String> identifiedLinks, final > Stack<String> toVisit, final String base, final Set<String> avoid) { > final List<IElement> elementsByXPath = getElementsByXPath("//a"); > > for (final IElement ie : elementsByXPath) { > final String href = ie.getAttribute("href"); > > final String linkForDup = useDiffPagesOnly && href.indexOf("?") > 0 ? > href.substring(0, href.indexOf("?")) : href; > > if (StringUtils.isNotBlank(href) && (href.startsWith(base) || > !href.startsWith("http://")) && !identifiedLinks.contains(linkForDup)) { > boolean shouldInclude = true; > > for (final String toAvoid : avoid) { > if (href.contains(toAvoid)) { > shouldInclude = false; > break; > } > } > > identifiedLinks.add(linkForDup); > if (shouldInclude) { > toVisit.push(href); > } > } > } > } > Benoit > > > ------------------------------ > *From:* Jevon Wright [mailto:je...@jevon.org] > *Sent:* 21 April 2010 00:16 > *To:* Usage problems for JWebUnit > *Subject:* Re: [JWebUnit-users] How to get all links in current page > > Hi Benoit, > > Interesting question - I am sure you could do something like that with > JWebUnit (through xpath, etc), but I imagine using a piece of software > developed specifically for dumping a site would be better, unless you want > to verify some unit tests across an entire site? > > Software like wget comes to mind. > > Cheers > Jevon > > On Tue, Apr 20, 2010 at 10:44 PM, Xhenseval, Benoit < > benoit.xhense...@credit-suisse.com> wrote: > >> Hi All >> >> I'm new to JWebUnit. >> >> I'm trying to develop the most obvious check... A spider that would >> follow all links within a given domain. >> >> Is there support for such a thing? >> >> If not, how could I get all links from a given page? >> >> I'm not using Selenium. >> >> Thanks a lot >> >> Benoit >> >> >> =============================================================================== >> Please access the attached hyperlink for an important electronic >> communications disclaimer: >> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html >> >> =============================================================================== >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> JWebUnit-users mailing list >> JWebUnit-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/jwebunit-users >> > > > > ============================================================================== > Please access the attached hyperlink for an important electronic > communications disclaimer: > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > > ============================================================================== > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > JWebUnit-users mailing list > JWebUnit-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/jwebunit-users > >
------------------------------------------------------------------------------
_______________________________________________ JWebUnit-users mailing list JWebUnit-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jwebunit-users