Re: [freenet-dev] questions about Library for my GSoC project

Matthew Toseland Fri, 24 May 2013 07:04:56 -0700

On Friday 24 May 2013 14:37:49 leuchtkaefer wrote:
> Hi toad,
> 
> > IMHO the key functionality here is to define an index format for 
> > filesharing, 
> > implement some UI for maintaining your per-identity index, and a tool to 
> > search 
> > the indexes of all the WoT identities we know about that have a high enough 
> > trust level. Once that core functionality is sorted out, there's a lot that 
> > can be done to optimise it further, and make it scale better: Bloom 
> > filters, 
> > merging indexes, optimising the tree (e.g. b*tree), possibly avoiding the 
> > tree 
> > for really popular terms, and so on.
> 
> I got your point. I agree.


Okay, so that's the initial goal.
> 
> > I suggest the format should be:
> > - Top level: A USK that points to the tree.
> > - Tree level: A btree, similar to that used by Library.
> > - Within a single term, there are many single files; Spider uses a sub-tree 
> > to 
> > sort them by relevance, I'm not sure that will work for filesharing, maybe 
> > it's just a list.
> > - Single file: The URL for the file (CHK, SSK, etc), the file size and 
> > hashes 
> > (these are easily extracted from the file without downloading it fully), 
> > maybe 
> > other stuff like description, link to a thumbnail or preview etc. Public 
> > key of 
> > the original uploader's WoT identity, and a signature. (For anti-spam 
> > purposes, e.g. so we can still filter out blacklisted uploaders when we're 
> > using a merged index)
> 
> The current index's structure from Library is:
> Index
>  |-- metadata
>  |-- ttab: BTreeMap<String, BTreeSet<TermEntry>> 
>  |-- utab: BTreeMap<URIKey, BTreeMap<FreenetURI, URIEntry>>

Right. That's intended for indexing HTML; we may want it to be different for 
indexing files.
> 
> The index root would usually be stored as an SSK splitfile.
> 
> I am also not sure that we need relevance for file sharing.

Yes, I don't see how we would compute it really.

> Are really thumbnail/preview necessary in freenet?

They would be links. Currently we don't support previewing whole files. Maybe 
we will in future. It would be easy to add them in the index layer. Whether it 
would be easy to generate them as well is less clear.
> 
> There is sth that still is a bit grey for me. Library implements the file 
> system level abstraction on Freenet and the entries are fed by the spider. 
> What is the structure that contains the on-disk file system, i.e. how a user 
> access its own local files? 
> I didn't see any documentation about the layers in Freenet. Could you please 
> write a simple scheme for illustrate Freenet layers? 

The user doesn't access local indexes at all. When we search for HTML, we 
search a set of on-Freenet indexes.

LAYERS:

KEYS/BLOCKS: CHKs (32KB, content hash keys, CHK@<routing key>,<encryption 
key>,<settings> where routing key is the hash of the encrypted data); SSKs (1KB 
payload, plus pubkey and signature, SSK@<public key hash>,<crypto 
key>/<filename>)

ROUTING: Requests are routed greedily according to locations of our peers 
(which are numbers between 0.0 and 1.0), and the key (also converted to a 
number between 0.0 and 1.0). Requests are routed greedily for ~ 20 hops, going 
to the "best" peer at each point. Data is returned along that path if it's 
found, and is cached on every node it passes through. Inserts are similar, they 
are cached on all nodes, but also go into long-term storage on a few nodes. 
Maintenance of connections and locations depends on whether this is opennet or 
darknet; load management and load limiting are hideously complex but irrelevant 
here. Also there are mechanisms to rapidly propagate popular keys that have 
been fetched previously. Basically it's a wierd kind of a DHT.

INTERFACE TO HIGHER LAYERS: Fetch a single block by its low-level key (CHK or 
SSK); insert a block; watch for a key/block to be found.

CLIENT LAYER: Files can be inserted as CHK or SSK. Keys can contain metadata, 
which allows us to decode the keys and decide what other keys to fetch. 
Splitfiles are used for anything larger than a single key; redirects are used 
to e.g. go from an SSK to a CHK; sites (directories) are another kind of 
metadata, but usually are inserted in "containers".

INTERFACE: Insert a file to a key. Fetch a key (to a file, not necessarily on 
disk). Insert a freesite/directory. Requests can be persistent or transient, 
and can be set to keep trying forever or to give up after a limited number of 
tries per block.

Interface detail: 
- FCP: Interface to external clients.
- HighLevelSimpleClient: Most clients inside the node use this simple-ish API.
- However they can also use the client layer (src/freenet/client/async/ e.g. 
ClientGetter) directly.
- Plugins count as "clients inside the node": there isn't really a clear plugin 
API apart from the above.

HIGHER LEVEL:
- Fproxy. Fetches keys over HTTP, lets you browse freesites.
- Persistent downloads (usually to disk). Integrated with fproxy.
- FMS: Usenet-like forums over Freenet, external application.
- Sone: Somewhere between Twitter and Facebook, external application.
- Library operates at this level. We have discussed the layers within Library 
IIRC.
> 
> Coming back to my proposal, let me rephrase what I wrote in my previous 
> email. I assume that all identities in one node belong to the same physical 
> person. 

Okay.

> Let's say that the node has a virtual file system that permits access to all 
> local files, plus public files. This virtual file system also identifies 
> which files have been share under which identity. All files/directories are 
> arranged hierarchically in a tree. This tree is composed by several subtrees.

Okay so you combine the indexes exported by all the identities into a single 
virtual tree.
> 
> Top-level tree description:
> 
> The root(level 0) is associated to the node (it could be the "private 
> identity" if such thing exist). All local files/group of files stored in the 
> node's datastore are linked with this private identity. At a level below, 
> some child nodes are associated with other identities. When the node change 
> the identity of its parent, such node is considered as a leaf for the 
> top-level tree and a root for the subtree. 

So in fact it is hierarchical by identities: To look up a file we need to know 
which identity to ask?
> 
> Subtree description:
> A subtree links to the root directory of an identity.  
> 
> Node description:
> A node can contain the link to a file or the link to a directory. It also 
> contains an associated identity. All nodes inside the same subtree will share 
> the same identity except for some leaf node. The pointer to a node that is 
> root of a subtree is publish in the correspondant WoT identity. All content 
> inside such subtree will be shared to members of that identity. Some more 
> restriction can be added, like node is visible for some x level of trust...

So can we have unlimited numbers of levels, with identities containing other 
identities containing other identities? I'm not sure how useful this is - isn't 
it easier to just ask WoT for a full list of acceptable identities? And links 
from one identity to another would be a graph not a tree?

Whereas if nodes can't contain more nodes, there's no difference between this 
and just having a list of indexes/subtrees to search?
> 
> If you like the idea I can continue in this direction and make a diagram of 
> this if you need. I am open to suggestions.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] questions about Library for my GSoC project

Reply via email to