On Wednesday 22 May 2013 09:30:03 leuchtkaefer wrote: > > Hi Ximin, > > I am not confusing the Library index with the low level block datastore. > But I still have doubts regarding Library :/ At first, I though the Library > index will act as an inverted index. But this is contrary to Freenet > anonymity goals. > > I read that the index contains the utab tree (utab: BTreeMap<URIKey, > BTreeMap<FreenetURI, URIEntry>>) and the freenet uri contains the routing key: > freenet:[KeyType@]RoutingKey,CryptoKey[,n1=v1,n2=v2,...][/docname][/metastring] > > Then it is not possible to share the index? > > Besides, only some users run the new spider so I guess only some users have > local indexes and publish some part on Freenet.
A lot of the below will be explaining stuff that you may already understand, please bear with me and ask for clarification on anything you don't follow. Thanks... An index indexes only the content that has been put into it, not all data on Freenet. With the current code, Spider does the indexing, by following links from one page to another, like an internet search engine. We are simply building an index on top of a block store. It does not include everything on Freenet. Currently Spider doesn't even use the Web of Trust, so announcing new freesites has to be done by other means, e.g. forums. Once some freesite links to them, Spider will pick them up eventually when it sees that that freesite has updated. We would like Spider to pick up site announcements from forums and/or from the Web of Trust. For filesharing, we probably want the index to only contain files people have added. For example, we could link an upload to our Web of Trust identity, so when the upload finishes, instead of manually posting it on a forum (or linking it from a freesite), we also automatically add it to our filesharing index, which is exposed through the Web of Trust, and can be searched (or merged) by anyone who has a high enough trust level for our WoT identity. Web of Trust itself is another high-level structure: One node can have many WoT identities or no WoT identities, it's all built from keys. > > Who has access to that on-Freenet index? a group of users (PSK) or is it > public for any Freenet user (guess no)? Indexes, like anything else on Freenet, are visible to anyone who has the key (in the form of a URL). The top level of an index is currently a USK. Freenet itself provides only very limited functionality: - Fetch a key (as a single file). - Insert a single file to a key. - Insert a bundle of keys as a "freesite". There are 4 types of keys: - CHK: Content hash key. URI depends on the (encrypted) content. - SSK: Signed subspace key. Belongs to a cryptographic identity, so the URI consists of the hash of the public key, and a filename. - USK: Updatable subspace key. A messy hack to provide an updatable key based on SSKs. URI consists of the hash of the public key, a filename, and a version number. - KSK: Keyword signed key. SSK where the public/private keypair is derived from the keyword. URI consists of a keyword, e.g. [email protected]. There is more detail on these on the wiki. Everything else is built on top of this at a higher layer, including the Web of Trust, forums/microblogging (FMS, Sone, Freetalk), and searching. > > I understand that a user can effectively be the owner of one part/branch of > the top-layer structure and update/modify/delete its part (COW). A top-layer > structure is the "overall vision" of one user composed by pieces published by > multiple users. A top-layer structure is always local (but some subtrees are > links to on Freenet structures). At present, a user owns an index, and can add to that index. The top level USK includes CHKs pointing to the two trees (by term and by uri). The trees themselves are made of CHKs, so changing a node in the tree requires updating all the nodes above it. But the advantage is that you only need to update those nodes, rather than re-upload the whole tree. The COW structure means that, in principle, a different user could add to that index in the same manner, creating their own new tree (with new CHK root(s)). This does not affect the first tree, but it shares most of its content with the first tree. And as above, it's relatively cheap, we only need to update the nodes that are changed and all their parents up the tree. The existing code in Library supports this in principle but there is no UI for it at present: Spider uses Library only to update its own index. When searching, Library (the search box on the homepage) will happily search multiple indexes at once and return the merged results from all of them. That's the limit of the "local overall vision". Individual indexes are normally published so that other people can use them, since running Spider is fairly heavy. > > I don't understand why data blocks were included in the index meaning that > the index contains another replica of the data? If that is the case, it is > necessary to replace b-trees for b+tree as it was previously suggested to > remove data and reduce index size. We do not include *data* in the index. What we do include is: - A map from terms (that is, keywords) to the pages including those terms, ranked by relevance, and including the location of the word within the page (i.e. the number of words before that word). - A map from url's to basic metadata (title etc). > > I am also thinking how to apply bloomfilters to the on-Freenet index. I > didn't check in detailed what is the current support of bloomfilters inside > Freenet. Initially, I understand that bloomfilters are applied for one hop > file request, meaning that bloomfilters are share with neighbors. No, at the moment the only use of Bloom filters is to optimise the datastore. This is not relevant to search. Search is at least two layers above the datastore. The proposal is that an index should link to a bloom filter of all the terms in that index. So then when we do a search over multiple indexes, we can check the bloom filter first (which we will have pre-loaded, so this is instant), and identify that most of the indexes don't contain the term (word) we are looking for. Then we only need to search those that might contain that word. > > Thanks a lot for your patience, > > leuchtkaefer
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list [email protected] https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
