On Wednesday 17 December 2008 14:59, Daniel Cheng wrote:
> On Wed, Dec 17, 2008 at 8:16 PM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> > On Wednesday 17 December 2008 01:35, Daniel Cheng wrote:
> >> On Wed, Dec 17, 2008 at 2:52 AM, Matthew Toseland
> >> <toad at amphibian.dyndns.org> wrote:
> >> > What should be in 0.8 and what should be postponed? We should aim to 
bring
> > out
> >> > 0.8 some time in 1H 09, so we need to agree on a rough idea of what
> > features
> >> > should be in and what should be postponed.
> >> >
> >> > IMHO the following are critical. I will implement them unless somebody
> > else
> >> > does.
> >> > - Getting db4o working and merged, for much less memory usage on large
> >> > download queues and almost instant resuming of downloads on startup.
> >> > - Fixing the Firefox profile corruption bug, probably by starting a
> > process in
> >> > browse.sh/browse.cmd which constantly polls the profiles.ini and fixes 
it
> > if
> >> > the freenet profile has become the default.
> >> > - Metadata changes: New metadata format, only used if freenet-ext.jar 
is
> > at
> >> > least build 26. If the metadata is of the new format, we can use the 
last
> >> > block in decoding a splitfile (right now we don't, because we get data
> >> > corruption due to several different algorithms having accidentally been
> > used
> >> > for padding). At the same time, introduce a checksum for the final data
> >> > (SHA-256), to prevent corruption. Allow other checksums (md5, SHA1 etc) 
to
> >> > help filesharing apps, and provide FCP access to them.
> >> > - Plugin updating over Freenet. We very nearly have this already. It is
> >> > important to be able to update plugins automatically, otherwise old 
buggy
> >> > versions can cause many problems which will never be resolved e.g. 
we've
> > had
> >> > problems with the IP detection plugins. Also, this should not be a lot 
of
> >> > work, we already have partial support for loading plugins from Freenet.
> >> > - Basic plugin dependancy support. This is necessary for the next item 
in
> > the
> >> > medium term.
> >> > - Freetalk: Provided that p0s is able to continue working on this, and
> >> > provided that his timetable doesn't slip too much, we should do 
everything
> >> > reasonably possible to ensure that Freetalk goes in to 0.8.0, and is
> > visible
> >> > (e.g. on the main menu).
> >> >
> >> > The following would be nice, if somebody else gets around to them:
> >> > - XMLSpider improvements: sdiz has done some great work on this, the
> > spider
> >> > can now continue from where it left off, and uses db4o so has much 
lower
> >> > memory usage.
> >> > - XMLLibrarian improvements: We have already integrated the search 
engine
> > onto
> >> > the home page, there are many small improvements that can be made such 
as
> >> > support for "adjacent word searches", a much better looking search 
results
> >> > page, and embedding into freesites.
> >>
> >> I am working on this.
> >> Just drafting the flow in my mind, not yet start coding.
> >>
> >> Items I have in mind:
> >>   - Perfetch some index files
> >>
> >>   - some level of "adjacent word searches", still planning
> >
> > Adjacent word searches are easy. All you need to do is detect that a 
phrase is
> > quoted, look up every index, and cross reference the word indices. The 
main
> > complication is that words of less than length 3 are not included in the
> > indexes...
> >>
> >>   - some form of ranking .
> >>     maybe something like Tf-idf
> >
> > Good idea.
> >>
> >>   - Catch up with 1973 programming style
> >>         -- don't use global variable to pass local state.
> >>
> >>   - Aggregating search result
> >>     (Group different version of USK together)
> >
> > I don't follow. You mean aggregate results from two different indexes? 
There
> > isn't really a user friendly way to add an index to the default set yet,
> > there isn't really a default set ...
> 
> Try searching "toad".
> You will end up with page full of different usk edition of your blog
> -- all of them have to same title.
> 
> it should show only one entry, links to the newest one.
> with some smaller text to link to old edition.
> 
> >>
> >>   - Stop words
> >>      common words such as "this", "that", shouldn't be indexed or 
searched,
> >>      -- the list should be included in the xml ....
> >>         something like <word v="the" stopword="true" /> in index_##.xml
> >
> > Then how are we supposed to search for them?
> 
> You don't,
>  see http://searchenginewatch.com/2156061
> 
> this reduce the index size -- freenet have high latency, size is important.
> 
> Currently, each index_##.xml include the set of URI it reference to.
> If we index the word like "the", we will have all the uri included there.

So if I search for "toad the idiot" using adjacent word matching (i.e. with 
the quotation marks), then it will download the indexes for "toad" 
and "idiot", and look for "idiot" at index t + 2 where "toad" is at index t? 
Ok...
> 
> >>
> >>   - Chinese/Korean/Japanese support in addition to Latin-like lanaguage
> >>     (this need a real tokenizer)

Important IMHO.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20081217/223f671d/attachment.pgp>

Reply via email to