On Wednesday 17 December 2008 14:59, Daniel Cheng wrote: > On Wed, Dec 17, 2008 at 8:16 PM, Matthew Toseland > <toad at amphibian.dyndns.org> wrote: > > On Wednesday 17 December 2008 01:35, Daniel Cheng wrote: > >> On Wed, Dec 17, 2008 at 2:52 AM, Matthew Toseland > >> <toad at amphibian.dyndns.org> wrote: > >> > What should be in 0.8 and what should be postponed? We should aim to bring > > out > >> > 0.8 some time in 1H 09, so we need to agree on a rough idea of what > > features > >> > should be in and what should be postponed. > >> > > >> > IMHO the following are critical. I will implement them unless somebody > > else > >> > does. > >> > - Getting db4o working and merged, for much less memory usage on large > >> > download queues and almost instant resuming of downloads on startup. > >> > - Fixing the Firefox profile corruption bug, probably by starting a > > process in > >> > browse.sh/browse.cmd which constantly polls the profiles.ini and fixes it > > if > >> > the freenet profile has become the default. > >> > - Metadata changes: New metadata format, only used if freenet-ext.jar is > > at > >> > least build 26. If the metadata is of the new format, we can use the last > >> > block in decoding a splitfile (right now we don't, because we get data > >> > corruption due to several different algorithms having accidentally been > > used > >> > for padding). At the same time, introduce a checksum for the final data > >> > (SHA-256), to prevent corruption. Allow other checksums (md5, SHA1 etc) to > >> > help filesharing apps, and provide FCP access to them. > >> > - Plugin updating over Freenet. We very nearly have this already. It is > >> > important to be able to update plugins automatically, otherwise old buggy > >> > versions can cause many problems which will never be resolved e.g. we've > > had > >> > problems with the IP detection plugins. Also, this should not be a lot of > >> > work, we already have partial support for loading plugins from Freenet. > >> > - Basic plugin dependancy support. This is necessary for the next item in > > the > >> > medium term. > >> > - Freetalk: Provided that p0s is able to continue working on this, and > >> > provided that his timetable doesn't slip too much, we should do everything > >> > reasonably possible to ensure that Freetalk goes in to 0.8.0, and is > > visible > >> > (e.g. on the main menu). > >> > > >> > The following would be nice, if somebody else gets around to them: > >> > - XMLSpider improvements: sdiz has done some great work on this, the > > spider > >> > can now continue from where it left off, and uses db4o so has much lower > >> > memory usage. > >> > - XMLLibrarian improvements: We have already integrated the search engine > > onto > >> > the home page, there are many small improvements that can be made such as > >> > support for "adjacent word searches", a much better looking search results > >> > page, and embedding into freesites. > >> > >> I am working on this. > >> Just drafting the flow in my mind, not yet start coding. > >> > >> Items I have in mind: > >> - Perfetch some index files > >> > >> - some level of "adjacent word searches", still planning > > > > Adjacent word searches are easy. All you need to do is detect that a phrase is > > quoted, look up every index, and cross reference the word indices. The main > > complication is that words of less than length 3 are not included in the > > indexes... > >> > >> - some form of ranking . > >> maybe something like Tf-idf > > > > Good idea. > >> > >> - Catch up with 1973 programming style > >> -- don't use global variable to pass local state. > >> > >> - Aggregating search result > >> (Group different version of USK together) > > > > I don't follow. You mean aggregate results from two different indexes? There > > isn't really a user friendly way to add an index to the default set yet, > > there isn't really a default set ... > > Try searching "toad". > You will end up with page full of different usk edition of your blog > -- all of them have to same title. > > it should show only one entry, links to the newest one. > with some smaller text to link to old edition. > > >> > >> - Stop words > >> common words such as "this", "that", shouldn't be indexed or searched, > >> -- the list should be included in the xml .... > >> something like <word v="the" stopword="true" /> in index_##.xml > > > > Then how are we supposed to search for them? > > You don't, > see http://searchenginewatch.com/2156061 > > this reduce the index size -- freenet have high latency, size is important. > > Currently, each index_##.xml include the set of URI it reference to. > If we index the word like "the", we will have all the uri included there.
So if I search for "toad the idiot" using adjacent word matching (i.e. with the quotation marks), then it will download the indexes for "toad" and "idiot", and look for "idiot" at index t + 2 where "toad" is at index t? Ok... > > >> > >> - Chinese/Korean/Japanese support in addition to Latin-like lanaguage > >> (this need a real tokenizer) Important IMHO. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 827 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20081217/223f671d/attachment.pgp>
