Re: [freenet-dev] [gsoc2013] UPDATE WEEK#5

Matthew Toseland Tue, 27 Aug 2013 08:14:30 -0700

On Monday 26 Aug 2013 22:13:35 leuchtkaefer wrote:
> 
> > Right. So:
> > 1. Generalise Library FCP API to support multiple indexes. Make Spider work 
> > with 
> > the new API.
> 
> How can I test spider?


Load the plugin. Go to the configuration page on the Spider menu and tell it 
how many threads to use (set it fairly low) and how big a buffer to use 
(towards the end, set this to the minimum).

> Will it consume all my laptop resources? 

Yes, it's pretty heavy. :(

> Anyway I am not touching Spider code, so it should work independently from 
> curator.

I don't think it's a good idea to keep two separate IndexUploader classes with 
almost identical code. IMHO it's important that we avoid major code duplication.
> 
> > 2. Make pushBuffer able to force an upload to Freenet.
> 
> This is related to my previous email, asking for more documentation. I am not 
> sure if the way index' entries are processed for Spider is the best way for 
> curator. I need documentation on the SpiderIndexUploader class

Right. This should be clearer now. You should be able to add a flag to force an 
immediate upload without too much difficulty.
> 
> > 3. Add support for a new kind of entry to Library: TermFileEntry (compare 
> > to 
> > e.g. TermPageEntry). This represents a file for filesharing purposes. It 
> > contains the URI, MIME type, possibly a title, hashes, etc. We may want to 
> > sign 
> > it. Create some sort of basic initial UI to add files to the index, via the 
> > Library API.
> > 
> Will be TermFileEntry and TermPageEntry on the same index? 

Dunno, they could be. The search (Library) front end should support both when 
searching multiple indexes.

> What do you mean by sign it?

We talked about merging indexes from multiple owners. Dealing with spam 
efficiently for that requires that individual TermFileEntry's identify their 
original owner, and preferably be signed by their original owner. However this 
is not a priority at the moment IMHO.
> 
> > We will want to search by keyword ("term" in Library). So there is a 
> > fundamental decision to make here: Do we want to duplicate the 
> > TermFileEntry 
> > (which could be fairly large, maybe 200 bytes?), under each term/keyword? 
> > The 
> > simplest answer is yes, although there are some costs in the amount of data 
> > we 
> > have to upload... the complex answer is no, create a second tree. IMHO the 
> > right 
> > answer for now is probably to have a single tree.
> 
> Well....I did the simple solution as we previously have discussed. Result is 
> negative, it takes ages to upload all the data. Maybe adding a tree would be 
> a better option, but I cannot imagine how to solve it yet. If I least I could 
> understand better the above mentioned class I may rearrange things to improve 
> performance.

The current code uses a single tree.

Does it take a long time to upload the CHKs, or to upload the USK at the end? 
How much data are you uploading?
> 
> > 4. Search support for specified USKs (possibly in Library?)
> Don't understand

Hmm, this is unclear.

AFAICS we want to use the existing Library search UI. However it may need some 
changes to support Curator index, especially if you use TermFileEntry or two 
trees.

> > 5. Search support for all USKs visible in the local WoT (where?)
> Again, what do you mean by search support for specified USK? If I know the 
> USK I just type it. Not sure what you mean.

Once we have the above, Library needs to be extended to search all the Curator 
indexes in the local WoT.
> 
> > 6. Optimisations, e.g. pre-fetch the top parts of the tree, post bloom 
> > filters 
> > of terms in the index etc.
> 
> I imagine this is for future, priorities are performance on index upload and 
> search content.

Right, get it working first, then make it fast.
> 
> > Yep. And eventually when the on-disk index gets big enough we call through 
> > to 
> > mergeToFreenet.
> 
> Do you want to call mergeToFreenet only when the on-disk index is big? So 
> entries are not uploaded inmediately? 
> Maybe that is the current problem, each entry is merged on the on-disk tree 
> and then is uploaded. It takes 2 hours to upload only one new entry that has 
> about 7 tpe. 

That is very odd. What else is running on your node? Web of Trust I guess, 
anything else?

Merging to the on-disk tree should take very little time. I assume most of the 
time taken is by waiting for the uploads to finish?

Do you have an exceptionally slow Freenet node? What's your bandwidth limit set 
to?

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] [gsoc2013] UPDATE WEEK#5

Reply via email to