On Thursday 11 December 2008 00:38, j16sdiz at freenetproject.org wrote:
> Author: j16sdiz
> Date: 2008-12-11 00:38:31 +0000 (Thu, 11 Dec 2008)
> New Revision: 24187
>
> Modified:
> trunk/plugins/XMLSpider/XMLSpider.java
> Log:
> be smart - don't fetch images, css, etc...
I didn't do this when I built the spider because I figured it wouldn't gain us
much speed, since we only ever download the top layer before finding the MIME
type and realising we don't want the data, and would lose us some
comprehensiveness. But you're probably right.
>
> Modified: trunk/plugins/XMLSpider/XMLSpider.java
> ===================================================================
> --- trunk/plugins/XMLSpider/XMLSpider.java 2008-12-10 16:49:17 UTC (rev
24186)
> +++ trunk/plugins/XMLSpider/XMLSpider.java 2008-12-11 00:38:31 UTC (rev
24187)
> @@ -199,6 +199,18 @@
> * @param uri the new uri that needs to be fetched for further indexing
> */
> public synchronized void queueURI(FreenetURI uri) {
> + String sURI = uri.toString();
> + if (sURI.endsWith(".png") ||
> + sURI.endsWith(".jpg") ||
> + sURI.endsWith(".css") ||
> + sURI.endsWith(".gif") ||
> + sURI.endsWith(".zip") ||
> + sURI.endsWith(".avi") ||
> + sURI.endsWith(".ico") ||
> + sURI.endsWith(".xpi") ||
> + sURI.endsWith(".iso"))
> + return; // be smart
> +
> if (uri.isUSK()) {
> if(uri.getSuggestedEdition() < 0)
> uri = uri.setSuggestedEdition((-1)*
> uri.getSuggestedEdition());
>
> _______________________________________________
> cvs mailing list
> cvs at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/cvs
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20081211/5984c053/attachment.pgp>