On Wed, Dec 17, 2008 at 8:16 PM, Matthew Toseland <toad at amphibian.dyndns.org> wrote: > On Wednesday 17 December 2008 01:35, Daniel Cheng wrote: >> On Wed, Dec 17, 2008 at 2:52 AM, Matthew Toseland >> <toad at amphibian.dyndns.org> wrote: >> > What should be in 0.8 and what should be postponed? We should aim to bring > out >> > 0.8 some time in 1H 09, so we need to agree on a rough idea of what > features >> > should be in and what should be postponed. >> > >> > IMHO the following are critical. I will implement them unless somebody > else >> > does. >> > - Getting db4o working and merged, for much less memory usage on large >> > download queues and almost instant resuming of downloads on startup. >> > - Fixing the Firefox profile corruption bug, probably by starting a > process in >> > browse.sh/browse.cmd which constantly polls the profiles.ini and fixes it > if >> > the freenet profile has become the default. >> > - Metadata changes: New metadata format, only used if freenet-ext.jar is > at >> > least build 26. If the metadata is of the new format, we can use the last >> > block in decoding a splitfile (right now we don't, because we get data >> > corruption due to several different algorithms having accidentally been > used >> > for padding). At the same time, introduce a checksum for the final data >> > (SHA-256), to prevent corruption. Allow other checksums (md5, SHA1 etc) to >> > help filesharing apps, and provide FCP access to them. >> > - Plugin updating over Freenet. We very nearly have this already. It is >> > important to be able to update plugins automatically, otherwise old buggy >> > versions can cause many problems which will never be resolved e.g. we've > had >> > problems with the IP detection plugins. Also, this should not be a lot of >> > work, we already have partial support for loading plugins from Freenet. >> > - Basic plugin dependancy support. This is necessary for the next item in > the >> > medium term. >> > - Freetalk: Provided that p0s is able to continue working on this, and >> > provided that his timetable doesn't slip too much, we should do everything >> > reasonably possible to ensure that Freetalk goes in to 0.8.0, and is > visible >> > (e.g. on the main menu). >> > >> > The following would be nice, if somebody else gets around to them: >> > - XMLSpider improvements: sdiz has done some great work on this, the > spider >> > can now continue from where it left off, and uses db4o so has much lower >> > memory usage. >> > - XMLLibrarian improvements: We have already integrated the search engine > onto >> > the home page, there are many small improvements that can be made such as >> > support for "adjacent word searches", a much better looking search results >> > page, and embedding into freesites. >> >> I am working on this. >> Just drafting the flow in my mind, not yet start coding. >> >> Items I have in mind: >> - Perfetch some index files >> >> - some level of "adjacent word searches", still planning > > Adjacent word searches are easy. All you need to do is detect that a phrase is > quoted, look up every index, and cross reference the word indices. The main > complication is that words of less than length 3 are not included in the > indexes... >> >> - some form of ranking . >> maybe something like Tf-idf > > Good idea. >> >> - Catch up with 1973 programming style >> -- don't use global variable to pass local state. >> >> - Aggregating search result >> (Group different version of USK together) > > I don't follow. You mean aggregate results from two different indexes? There > isn't really a user friendly way to add an index to the default set yet, > there isn't really a default set ...
Try searching "toad". You will end up with page full of different usk edition of your blog -- all of them have to same title. it should show only one entry, links to the newest one. with some smaller text to link to old edition. >> >> - Stop words >> common words such as "this", "that", shouldn't be indexed or searched, >> -- the list should be included in the xml .... >> something like <word v="the" stopword="true" /> in index_##.xml > > Then how are we supposed to search for them? You don't, see http://searchenginewatch.com/2156061 this reduce the index size -- freenet have high latency, size is important. Currently, each index_##.xml include the set of URI it reference to. If we index the word like "the", we will have all the uri included there. >> >> - Chinese/Korean/Japanese support in addition to Latin-like lanaguage >> (this need a real tokenizer) > > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl >
