On Monday 22 December 2008 09:51, [email protected] wrote:
> Author: j16sdiz
> Date: 2008-12-22 09:51:42 +0000 (Mon, 22 Dec 2008)
> New Revision: 24726
> 
> Modified:
>    trunk/plugins/XMLSpider/XMLSpider.java
> Log:
> should generate more compact index, but this is slower
> 
> Modified: trunk/plugins/XMLSpider/XMLSpider.java
> ===================================================================
> --- trunk/plugins/XMLSpider/XMLSpider.java    2008-12-22 09:51:21 UTC (rev 
24725)
> +++ trunk/plugins/XMLSpider/XMLSpider.java    2008-12-22 09:51:42 UTC (rev 
24726)
> @@ -94,7 +94,7 @@
>        * Lists the allowed mime types of the fetched page. 
>        */
>       public Set<String> allowedMIMETypes;
> -     static final int MAX_ENTRIES = 800;
> +     static final int MAX_ENTRIES = 2000;

IMHO this is the right tradeoff: We need the subindexes to be all roughly the 
same, reasonably large, size... Ideally we'd only split when the gzipped 
subindex is over 2MB (or some other threshold)... That doesn't require a 
format change, we could feed the gzipped data to the client layer, it might 
want to try to recompress with other codecs though... We might want to 
remember where we'd split last time we generated the index...

>       static final long MAX_SUBINDEX_UNCOMPRESSED_SIZE = 4 * 1024 * 1024;
>       private static int version = 33;
>       private static final String pluginName = "XML spider " + version;

Attachment: pgpJzDmO1O1Gf.pgp
Description: PGP signature

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to