Hi Ximin,
I am not confusing the Library index with the low level block datastore.
But I still have doubts regarding Library :/ At first, I though the Library
index will act as an inverted index. But this is contrary to Freenet anonymity
goals.
I read that the index contains the utab tree (utab: BTreeMap<URIKey,
BTreeMap<FreenetURI, URIEntry>>) and the freenet uri contains the routing key:
freenet:[KeyType@]RoutingKey,CryptoKey[,n1=v1,n2=v2,...][/docname][/metastring]
Then it is not possible to share the index?
Besides, only some users run the new spider so I guess only some users have
local indexes and publish some part on Freenet.
Who has access to that on-Freenet index? a group of users (PSK) or is it public
for any Freenet user (guess no)?
I understand that a user can effectively be the owner of one part/branch of the
top-layer structure and update/modify/delete its part (COW). A top-layer
structure is the "overall vision" of one user composed by pieces published by
multiple users. A top-layer structure is always local (but some subtrees are
links to on Freenet structures).
I don't understand why data blocks were included in the index meaning that the
index contains another replica of the data? If that is the case, it is
necessary to replace b-trees for b+tree as it was previously suggested to
remove data and reduce index size.
I am also thinking how to apply bloomfilters to the on-Freenet index. I didn't
check in detailed what is the current support of bloomfilters inside Freenet.
Initially, I understand that bloomfilters are applied for one hop file request,
meaning that bloomfilters are share with neighbors.
Thanks a lot for your patience,
leuchtkaefer
>________________________________
> From: Ximin Luo <[email protected]>
>To: leuchtkaefer <[email protected]>
>Cc: Discussion of development issues <[email protected]>
>Sent: Tuesday, May 21, 2013 10:47 PM
>Subject: Re: [freenet-dev] questions about Library for my GSoC project
>
>
>On 21/05/13 17:11, leuchtkaefer wrote:
>> Hi Ximin,
>> Thanks for your answer. I will rephrase what you wrote (adding my own view)
>> to check if I understand.
>>
>>
>> I understand that each node constructs an SkeletonBTreeMap (huge index) that
>> in the long-term will contain a huge index with all the successful searches
>> initiated by that node or that passed though that node. Using the
>> SkeletonBTreeMap, each node has a partial view of the documents stored in
>> freenet system, but only about documents that passed thought that node.
>>
>> The paragraph starting by For "Library's B-Tree, this is not feasible" is
>> hard to understand.
>>
>> Since datastore uses a LRU-like cache replacement, when a node's datastore
>> size is exceeded old files are deleted. This should be reflected in the huge
>> index maintain by all remote's node that have links to that recently deleted
>> file. But it is not possible to reflect it, nodes don't know when the
>> links/items in their index are not valid anymore (i.e., when a remote node
>> deleted the file as part of LRU police replacement).
>>
>
>It sounds like you're confusing the Library index with the freenet datastore.
>The former is like a filesystem abstraction, the latter is like a low-level
>block device. There is no feasible way for the former to control the operation
>of the latter, and it would be undesirable in any case as layer violation.
>
>
>I'm also unconvinced that it would be desirable for nodes to tell other nodes
>about LRU drops, due to the potential leak of information. Do you have an
>argument to show that there is no reduction in security?
>
>> If I understood correctly, we can continue discussing what is written below.
>> If not, you can forget about below part and give more help to follow your
>> previous e-mail.
>>
>> In that case maybe the solution is an announcement policy that broadcast to
>> neighbors that such file(key) is not valid anymore. Such messages will be
>> harmless and not too promiscuous, won't them?…Although, neighbors who didn't
>> knew such file was available through such node will learn it though that
>> kind of announcements. Is that a problem?
>>
>> For instance, take the following scenario:
>> (assumptions: for simplicity 1 identity per node, index items are simplified
>> to {key,location})
>> Neighbors of Node1(n1): {n2, n3, n4, n5, n7}
>>
>> Neighbors of Node4(n4): {n1, n30, n7}
>> Scenario:
>> 1) Request of n5 to n1: "Give me file with key=200"
>> 2) n1 index contains {key=201,n4}
>> 3) n1 decides to forward request to n4: "Give me file with key=200"
>> 4) n4 answers to n1 with file
>> 5) n1 forwards file to n5
>> 6) n1 stores a copy of the file in its cache datastore
>> 7) n1 stores {key=200,n4} in his index
>> 8) n1 stores {key=200,n5} in his index (n1 does not know n5 is final
>> destination).
>> 9) n1 receives another request for key 200, he doesn't need to forward
>> request because the file is stored in its cache
>> 10) (time passes) file is deleted from n1's LRU cache
>> 11) (time passes) n1 receives another request for key 200 from n3.
>> 12) n1 needs to decide whether to forward request to n4 or n5. Not sure what
>> is the criteria here, maybe he uses the node's reputation (WoT).
>> 13) Some time later, file with key=200 is deleted from n4's datastore
>> because of LRU policy.
>> 14) n4 broadcast to its neighbors n1, n30, n7 that key=200 is eliminated
>>
>> However, the fact that key=200 can be found though n4 may imply that node n4
>> knows how to get key=200. There is another reason that makes me think this
>> assumption is valid. Nodes with similar keys are cluster together, aren't
>> them?
>> Then, which routing decision is better for key=200? n5 or still using n4?
>> *at this point I am bewildered*
>> Not sure if n1 should delete {key=200,n4} from his index when he receives
>> the n4 broadcast.
>>
>> Some comments: I am not sure if is possible to store a key with multiple
>> locations like steps 7 and 8, I guess is possible. I am still confused about
>> location swapping and what are the consequences of location swapping in the
>> node's index
>>
>> Maybe a silly question, but...What do you mean by "top-level data
>> structure". What is the top-level data structure of Freenet?
>>
>
>"Top-level data structure" refers to the identity of the pieces of the Library
>index as a coherent whole, and is represented by the SkeletonBTreeMap
>structure. By contrast, Bigtable/freenet provides per-row/key access, and
>there is no concept of "the entire table" or "the entire DHT" from the
>client's perspective.
>
>> Regarding your security note[1]. Not sure what do you mean. I suppose that
>> you refer to the fact that a node datastore cannot be accessed remotely.
>> Users only send requests to other nodes asking for a file and the remote
>> node verifies its datastore and answers the request. Thus, the storage is
>> accessed only locally.
>>
>
>That note is not about the datastore, it's about the Library index which
>operates on top of it. It's stored as an SSK - do you know what that is? The
>same concept is in Tahoe-LAFS as well.
>
>> Regarding what I am trying to achieve. I am looking somehow accelerate the
>> speed of the search, share bookmarks (specialized in some terms) among a
>> group of people probably by using PSK maybe some friends of friends. Improve
>> the Library code.
>>
>
>OK - in that case, it is vital to understand every aspect of what I'm
>describing. Thanks for your patience. :)
>
>Ximin
>
>> Thanks a lot again and forgive my dummy assumptions,
>>
>> leuchtkaefer
>>
>>
>>
>> ________________________________
>> From: Ximin Luo <[email protected]>
>> To: leuchtkaefer <[email protected]>; Discussion of development issues
>> <[email protected]>
>> Sent: Tuesday, May 21, 2013 12:06 PM
>> Subject: Re: [freenet-dev] questions about Library for my GSoC project
>>
>>
>> On 20/05/13 22:36, leuchtkaefer wrote:
>>>
>>>
>>> Hi infinity0,
>>>
>>> My proposal to GSoC13 is highly related to your code (Library).
>>>
>>> First, do you have any extra documentation on the code that you think it
>>> could be useful for me to understand the most important parts, such the
>>> SkeletonBTreeMap?
>>>
>>
>> Hello,
>>
>> I did Library for GSoC 2009 and back then I was inexperienced with building
>> and
>> engineering large software codebases (such as freenet and its plugin
>> ecosystem). There are many aspects of Library that I would do differently
>> today
>> if I was re-doing that project.
>>
>> A large part of Library focuses on serialisation of massively-large(1) data
>> structures, implemented *on top of* freenet's decentralised(2) storage. (1)
>> and
>> (2) together is what makes the problem hard.[1] For my GSoC 2009 project, I
>> tried to solve this problem by implementing a load-on-demand local data
>> structure (SkeletonBTreeMap) that represents the *overall* data structure (a
>> B-tree) as it exists on freenet storage.
>>
>> By contrast, massively scalable distributed systems such as Bigtable, and
>> even
>> the underlying freenet DHT storage system, never expose the *overall* data
>> structure to the clients of those systems - instead they allow piece-by-piece
>> access, e.g. by row, or by key, and the client never sees the top-level data
>> structure.
>>
>> For Library's B-Tree, this is not feasible, because (due to the design of
>> freenet) we cannot offload computation (i.e. data structure book-keeping)
>> onto
>> other nodes.[2] It was also not feasible to use freenet's decentralised
>> storage
>> more directly, because it has certain properties (such as LRU cache) that are
>> not acceptable for a search index.
>>
>> So. That was an overview of the abstract algorithmic issues surrounding the
>> design of Library. Please let me know if any part of what I just said is not
>> understandable. Every sentence makes an important theoretical point. If you
>> do
>> not fully understand *any part*, ask me to clarify, otherwise I fear that you
>> may repeat the same mistakes that I did. This is not exactly a problem since
>> GSoC is partly about learning - but it would be sub-optimal for the project's
>> progress.
>>
>> I'll hold off on answering the rest of your questions to give you a chance to
>> digest my previous answers. Understanding those will make it easier for you
>> to
>> understand my answers to the next section - and you may even be able to
>> figure
>> those answers out for yourself without me explaining it explicitly.
>>
>> Also, if you give me some context on what you're trying to achieve, I can
>> give
>> more specific advice.
>>
>> Let me know how you get along, and good luck!
>> Ximin
>>
>> [1] We are lucky that we don't have to further worry about security because
>> the
>> underlying freenet storage allows us to restrict access to one single user.
>> [2] Perhaps one day, a system that supports fully homomorphic encryption will
>> allow this to happen.
>>
>>> Second, I have some questions:
>>>
>>> 1. You disabled the "boolean internal_entries" inside the
>>> classSkeletonBTreeMap and use option 2. I don't understand what do you mean
>>> about a dummy serialiser that copies task.data to task.meta. What contains
>>> task.data?
>>>
>>> 2. What means deflate/inflate the node?
>>>
>>> 3. What is a GhostNode? I understood is a not desirable structure used to
>>> contain some metadata or sth related with the serializer and needs to be
>>> removed.
>>>
>>> If you can elaborate more about Library, besides the documentation already
>>> published in the wiki, it will be of great help.
>>>
>>> Thanks in advance,
>>>
>>> leuchtkaefer
>>>
>>>
>>>
>>> _______________________________________________
>>> Devl mailing list
>>> [email protected]
>>> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>>
>>
>_______________________________________________
>Devl mailing list
>[email protected]
>https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>
>_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl