On Thursday 11 December 2008 00:38, j16sdiz at freenetproject.org wrote:
> Author: j16sdiz
> Date: 2008-12-11 00:38:31 +0000 (Thu, 11 Dec 2008)
> New Revision: 24187
> 
> Modified:
>    trunk/plugins/XMLSpider/XMLSpider.java
> Log:
> be smart - don't fetch images, css, etc...

I didn't do this when I built the spider because I figured it wouldn't gain us 
much speed, since we only ever download the top layer before finding the MIME 
type and realising we don't want the data, and would lose us some 
comprehensiveness. But you're probably right.
> 
> Modified: trunk/plugins/XMLSpider/XMLSpider.java
> ===================================================================
> --- trunk/plugins/XMLSpider/XMLSpider.java    2008-12-10 16:49:17 UTC (rev 
24186)
> +++ trunk/plugins/XMLSpider/XMLSpider.java    2008-12-11 00:38:31 UTC (rev 
24187)
> @@ -199,6 +199,18 @@
>        * @param uri the new uri that needs to be fetched for further indexing
>        */
>       public synchronized void queueURI(FreenetURI uri) {
> +             String sURI = uri.toString();
> +             if (sURI.endsWith(".png") ||
> +                     sURI.endsWith(".jpg") ||
> +                     sURI.endsWith(".css") ||
> +                     sURI.endsWith(".gif") ||
> +                     sURI.endsWith(".zip") ||
> +                     sURI.endsWith(".avi") ||
> +                     sURI.endsWith(".ico") ||
> +                     sURI.endsWith(".xpi") ||
> +                     sURI.endsWith(".iso"))
> +                     return; // be smart
> +
>               if (uri.isUSK()) {
>                       if(uri.getSuggestedEdition() < 0)
>                               uri = uri.setSuggestedEdition((-1)* 
> uri.getSuggestedEdition());
> 
> _______________________________________________
> cvs mailing list
> cvs at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/cvs
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20081211/5984c053/attachment.pgp>

Reply via email to