On Wed, Dec 17, 2008 at 8:16 PM, Matthew Toseland
<toad at amphibian.dyndns.org> wrote:
> On Wednesday 17 December 2008 01:35, Daniel Cheng wrote:
>> On Wed, Dec 17, 2008 at 2:52 AM, Matthew Toseland
>> <toad at amphibian.dyndns.org> wrote:
>> > What should be in 0.8 and what should be postponed? We should aim to bring
> out
>> > 0.8 some time in 1H 09, so we need to agree on a rough idea of what
> features
>> > should be in and what should be postponed.
>> >
>> > IMHO the following are critical. I will implement them unless somebody
> else
>> > does.
>> > - Getting db4o working and merged, for much less memory usage on large
>> > download queues and almost instant resuming of downloads on startup.
>> > - Fixing the Firefox profile corruption bug, probably by starting a
> process in
>> > browse.sh/browse.cmd which constantly polls the profiles.ini and fixes it
> if
>> > the freenet profile has become the default.
>> > - Metadata changes: New metadata format, only used if freenet-ext.jar is
> at
>> > least build 26. If the metadata is of the new format, we can use the last
>> > block in decoding a splitfile (right now we don't, because we get data
>> > corruption due to several different algorithms having accidentally been
> used
>> > for padding). At the same time, introduce a checksum for the final data
>> > (SHA-256), to prevent corruption. Allow other checksums (md5, SHA1 etc) to
>> > help filesharing apps, and provide FCP access to them.
>> > - Plugin updating over Freenet. We very nearly have this already. It is
>> > important to be able to update plugins automatically, otherwise old buggy
>> > versions can cause many problems which will never be resolved e.g. we've
> had
>> > problems with the IP detection plugins. Also, this should not be a lot of
>> > work, we already have partial support for loading plugins from Freenet.
>> > - Basic plugin dependancy support. This is necessary for the next item in
> the
>> > medium term.
>> > - Freetalk: Provided that p0s is able to continue working on this, and
>> > provided that his timetable doesn't slip too much, we should do everything
>> > reasonably possible to ensure that Freetalk goes in to 0.8.0, and is
> visible
>> > (e.g. on the main menu).
>> >
>> > The following would be nice, if somebody else gets around to them:
>> > - XMLSpider improvements: sdiz has done some great work on this, the
> spider
>> > can now continue from where it left off, and uses db4o so has much lower
>> > memory usage.
>> > - XMLLibrarian improvements: We have already integrated the search engine
> onto
>> > the home page, there are many small improvements that can be made such as
>> > support for "adjacent word searches", a much better looking search results
>> > page, and embedding into freesites.
>>
>> I am working on this.
>> Just drafting the flow in my mind, not yet start coding.
>>
>> Items I have in mind:
>>   - Perfetch some index files
>>
>>   - some level of "adjacent word searches", still planning
>
> Adjacent word searches are easy. All you need to do is detect that a phrase is
> quoted, look up every index, and cross reference the word indices. The main
> complication is that words of less than length 3 are not included in the
> indexes...
>>
>>   - some form of ranking .
>>     maybe something like Tf-idf
>
> Good idea.
>>
>>   - Catch up with 1973 programming style
>>         -- don't use global variable to pass local state.
>>
>>   - Aggregating search result
>>     (Group different version of USK together)
>
> I don't follow. You mean aggregate results from two different indexes? There
> isn't really a user friendly way to add an index to the default set yet,
> there isn't really a default set ...

Try searching "toad".
You will end up with page full of different usk edition of your blog
-- all of them have to same title.

it should show only one entry, links to the newest one.
with some smaller text to link to old edition.

>>
>>   - Stop words
>>      common words such as "this", "that", shouldn't be indexed or searched,
>>      -- the list should be included in the xml ....
>>         something like <word v="the" stopword="true" /> in index_##.xml
>
> Then how are we supposed to search for them?

You don't,
 see http://searchenginewatch.com/2156061

this reduce the index size -- freenet have high latency, size is important.

Currently, each index_##.xml include the set of URI it reference to.
If we index the word like "the", we will have all the uri included there.

>>
>>   - Chinese/Korean/Japanese support in addition to Latin-like lanaguage
>>     (this need a real tokenizer)
>
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>

Reply via email to