On Monday 22 December 2008 09:51, [email protected] wrote: > Author: j16sdiz > Date: 2008-12-22 09:51:42 +0000 (Mon, 22 Dec 2008) > New Revision: 24726 > > Modified: > trunk/plugins/XMLSpider/XMLSpider.java > Log: > should generate more compact index, but this is slower > > Modified: trunk/plugins/XMLSpider/XMLSpider.java > =================================================================== > --- trunk/plugins/XMLSpider/XMLSpider.java 2008-12-22 09:51:21 UTC (rev 24725) > +++ trunk/plugins/XMLSpider/XMLSpider.java 2008-12-22 09:51:42 UTC (rev 24726) > @@ -94,7 +94,7 @@ > * Lists the allowed mime types of the fetched page. > */ > public Set<String> allowedMIMETypes; > - static final int MAX_ENTRIES = 800; > + static final int MAX_ENTRIES = 2000;
IMHO this is the right tradeoff: We need the subindexes to be all roughly the same, reasonably large, size... Ideally we'd only split when the gzipped subindex is over 2MB (or some other threshold)... That doesn't require a format change, we could feed the gzipped data to the client layer, it might want to try to recompress with other codecs though... We might want to remember where we'd split last time we generated the index... > static final long MAX_SUBINDEX_UNCOMPRESSED_SIZE = 4 * 1024 * 1024; > private static int version = 33; > private static final String pluginName = "XML spider " + version;
pgpJzDmO1O1Gf.pgp
Description: PGP signature
_______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
