Hi toad,

> IMHO the key functionality here is to define an index format for filesharing, 
> implement some UI for maintaining your per-identity index, and a tool to 
> search 
> the indexes of all the WoT identities we know about that have a high enough 
> trust level. Once that core functionality is sorted out, there's a lot that 
> can be done to optimise it further, and make it scale better: Bloom filters, 
> merging indexes, optimising the tree (e.g. b*tree), possibly avoiding the 
> tree 
> for really popular terms, and so on.

I got your point. I agree.

> I suggest the format should be:
> - Top level: A USK that points to the tree.
> - Tree level: A btree, similar to that used by Library.
> - Within a single term, there are many single files; Spider uses a sub-tree 
> to 
> sort them by relevance, I'm not sure that will work for filesharing, maybe 
> it's just a list.
> - Single file: The URL for the file (CHK, SSK, etc), the file size and hashes 
> (these are easily extracted from the file without downloading it fully), 
> maybe 
> other stuff like description, link to a thumbnail or preview etc. Public key 
> of 
> the original uploader's WoT identity, and a signature. (For anti-spam 
> purposes, e.g. so we can still filter out blacklisted uploaders when we're 
> using a merged index)
> 

The current index's structure from Library is:
Index
 |-- metadata
 |-- ttab: BTreeMap<String, BTreeSet<TermEntry>> 
 |-- utab: BTreeMap<URIKey, BTreeMap<FreenetURI, URIEntry>>

The index root would usually be stored as an SSK splitfile.


I am also not sure that we need relevance for file sharing.
Are really thumbnail/preview necessary in freenet?

There is sth that still is a bit grey for me. Library implements the file 
system level abstraction on Freenet and the entries are fed by the spider. What 
is the structure that contains the on-disk file system, i.e. how a user access 
its own local files? 
I didn't see any documentation about the layers in Freenet. Could you please 
write a simple scheme for illustrate Freenet layers? 

Coming back to my proposal, let me rephrase what I wrote in my previous email. 
I assume that all identities in one node belong to the same physical person. 
Let's say that the node has a virtual file system that permits access to all 
local files, plus public files. This virtual file system also identifies which 
files have been share under which identity. All files/directories are arranged 
hierarchically in a tree. This tree is composed by several subtrees.

Top-level tree description:

The root(level 0) is associated to the node (it could be the "private identity" 
if such thing exist). All local files/group of files stored in the node's 
datastore are linked with this private identity. At a level below, some child 
nodes are associated with other identities. When the node change the identity 
of its parent, such node is considered as a leaf for the top-level tree and a 
root for the subtree. 

Subtree description:
A subtree links to the root directory of an identity.  

Node description:
A node can contain the link to a file or the link to a directory. It also 
contains an associated identity. All nodes inside the same subtree will share 
the same identity except for some leaf node. The pointer to a node that is root 
of a subtree is publish in the correspondant WoT identity. All content inside 
such subtree will be shared to members of that identity. Some more restriction 
can be added, like node is visible for some x level of trust...

If you like the idea I can continue in this direction and make a diagram of 
this if you need. I am open to suggestions.


> If you are interested in distributed spidering (for HTML), that would also be 
> a 
> great project. There is somebody else working on it, but I don't think much 
> has been seen from him for a while. He does have a separate search index 
> system 
> for HTML, which is more efficient but less scalable IIRC. IMHO a good 
> filesharing search system is more urgent.

Already answered.

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to