Re: [freenet-dev] questions about Library for my GSoC project

leuchtkaefer Wed, 22 May 2013 01:30:34 -0700


Hi Ximin,


I am not confusing the Library index with the low level block datastore. 
But I still have doubts regarding Library :/  At first, I though the Library 
index will act as an inverted index. But this is contrary to Freenet anonymity 
goals.

I read that the index contains the utab tree (utab: BTreeMap<URIKey, 
BTreeMap<FreenetURI, URIEntry>>) and the freenet uri contains the routing key:
freenet:[KeyType@]RoutingKey,CryptoKey[,n1=v1,n2=v2,...][/docname][/metastring]


Then it is not possible to share the index? 

Besides, only some users run the new spider so I guess only some users have 
local indexes and publish some part on Freenet. 

Who has access to that on-Freenet index? a group of users (PSK) or is it public 
for any Freenet user (guess no)? 

I understand that a user can effectively be the owner of one part/branch of the 
top-layer structure and update/modify/delete its part (COW). A top-layer 
structure is the "overall vision" of one user composed by pieces published by 
multiple users. A top-layer structure is always local (but some subtrees are 
links to on Freenet structures).

I don't understand why data blocks were included in the index meaning that the 
index contains another replica of the data? If that is the case, it is 
necessary to replace b-trees for b+tree as it was previously suggested to 
remove data and reduce index size.


I am also thinking how to apply bloomfilters to the on-Freenet index. I didn't 
check in detailed what is the current support of bloomfilters inside Freenet. 
Initially, I understand that bloomfilters are applied for one hop file request, 
meaning that bloomfilters are share with neighbors. 

Thanks a lot for your patience,

leuchtkaefer



>________________________________
> From: Ximin Luo <[email protected]>
>To: leuchtkaefer <[email protected]> 
>Cc: Discussion of development issues <[email protected]> 
>Sent: Tuesday, May 21, 2013 10:47 PM
>Subject: Re: [freenet-dev] questions about Library for my GSoC project
> 
>
>On 21/05/13 17:11, leuchtkaefer wrote:
>> Hi Ximin,
>> Thanks for your answer. I will rephrase what you wrote (adding my own view) 
>> to check if I understand.
>> 
>> 
>> I understand that each node constructs an SkeletonBTreeMap (huge index) that 
>> in the long-term will contain a huge index with all the successful searches 
>> initiated by that node or that passed though that node. Using the 
>> SkeletonBTreeMap, each node has a partial view of the documents stored in 
>> freenet system, but only about documents that passed thought that node.
>> 
>> The paragraph starting by For "Library's B-Tree, this is not feasible" is 
>> hard to understand. 
>> 
>> Since datastore uses a LRU-like cache replacement, when a node's datastore 
>> size is exceeded old files are deleted. This should be reflected in the huge 
>> index maintain by all remote's node that have links to that recently deleted 
>> file. But it is not possible to reflect it, nodes don't know when the 
>> links/items in their index are not valid anymore (i.e., when a remote node 
>> deleted the file as part of LRU police replacement).
>> 
>
>It sounds like you're confusing the Library index with the freenet datastore. 
>The former is like a filesystem abstraction, the latter is like a low-level 
>block device. There is no feasible way for the former to control the operation 
>of the latter, and it would be undesirable in any case as layer violation.
>
>
>I'm also unconvinced that it would be desirable for nodes to tell other nodes 
>about LRU drops, due to the potential leak of information. Do you have an 
>argument to show that there is no reduction in security?
>
>> If I understood correctly, we can continue discussing what is written below. 
>> If not, you can forget about below part and give more help to follow your 
>> previous e-mail.
>> 
>> In that case maybe the solution is an announcement policy that broadcast to 
>> neighbors that such file(key) is not valid anymore. Such messages will be 
>> harmless and not too promiscuous, won't them?…Although, neighbors who didn't 
>> knew such file was available through such node will learn it though that 
>> kind of announcements. Is that a problem?
>> 
>> For instance, take the following scenario:
>> (assumptions: for simplicity 1 identity per node, index items are simplified 
>> to {key,location})
>> Neighbors of Node1(n1): {n2, n3, n4, n5, n7}
>> 
>> Neighbors of Node4(n4): {n1, n30, n7}
>> Scenario:
>> 1) Request of n5 to n1: "Give me file with key=200"
>> 2) n1 index contains {key=201,n4}
>> 3) n1 decides to forward request to n4:  "Give me file with key=200"
>> 4) n4 answers to n1 with file
>> 5) n1 forwards file to n5
>> 6) n1 stores a copy of the file in its cache datastore
>> 7) n1 stores {key=200,n4} in his index
>> 8) n1 stores {key=200,n5} in his index (n1 does not know n5 is final 
>> destination).
>> 9) n1 receives another request for key 200, he doesn't need to forward 
>> request because the file is stored in its cache
>> 10) (time passes) file is deleted from n1's LRU cache
>> 11) (time passes) n1 receives another request for key 200 from n3.
>> 12) n1 needs to decide whether to forward request to n4 or n5. Not sure what 
>> is the criteria here, maybe he uses the node's reputation (WoT).
>> 13) Some time later, file with key=200 is deleted from n4's datastore 
>> because of LRU policy.
>> 14) n4 broadcast to its neighbors n1, n30, n7 that key=200 is eliminated
>> 
>> However, the fact that key=200 can be found though n4 may imply that node n4 
>> knows how to get key=200. There is another reason that makes me think this 
>> assumption is valid. Nodes with similar keys are cluster together, aren't 
>> them?
>> Then, which routing decision is better for key=200? n5 or still using n4? 
>> *at this point I am bewildered*
>> Not sure if n1 should delete {key=200,n4} from his index when he receives 
>> the n4 broadcast.
>> 
>> Some comments: I am not sure if is possible to store a key with multiple 
>> locations like steps 7 and 8, I guess is possible. I am still confused about 
>> location swapping and what are the consequences of location swapping in the 
>> node's index
>> 
>> Maybe a silly question, but...What do you mean by "top-level data 
>> structure". What is the top-level data structure of Freenet?
>> 
>
>"Top-level data structure" refers to the identity of the pieces of the Library 
>index as a coherent whole, and is represented by the SkeletonBTreeMap 
>structure. By contrast, Bigtable/freenet provides per-row/key access, and 
>there is no concept of "the entire table" or "the entire DHT" from the 
>client's perspective.
>
>> Regarding your security note[1]. Not sure what do you mean. I suppose that 
>> you refer to the fact that a node datastore cannot be accessed remotely. 
>> Users only send requests to other nodes asking for a file and the remote 
>> node verifies its datastore and answers the request. Thus, the storage is 
>> accessed only locally.
>> 
>
>That note is not about the datastore, it's about the Library index which 
>operates on top of it. It's stored as an SSK - do you know what that is? The 
>same concept is in Tahoe-LAFS as well.
>
>> Regarding what I am trying to achieve. I am looking somehow accelerate the 
>> speed of the search, share bookmarks (specialized in some terms) among a 
>> group of people probably by using PSK maybe some friends of friends. Improve 
>> the Library code.
>> 
>
>OK - in that case, it is vital to understand every aspect of what I'm 
>describing. Thanks for your patience. :)
>
>Ximin
>
>> Thanks a lot again and forgive my dummy assumptions,
>> 
>> leuchtkaefer
>> 
>> 
>> 
>> ________________________________
>>  From: Ximin Luo <[email protected]>
>> To: leuchtkaefer <[email protected]>; Discussion of development issues 
>> <[email protected]> 
>> Sent: Tuesday, May 21, 2013 12:06 PM
>> Subject: Re: [freenet-dev] questions about Library for my GSoC project
>>  
>> 
>> On 20/05/13 22:36, leuchtkaefer wrote:
>>>
>>>
>>> Hi infinity0,
>>>
>>> My proposal to GSoC13 is highly related to your code (Library). 
>>>
>>> First, do you have any extra documentation on the code that you think it 
>>> could be useful for me to understand the most important parts, such the 
>>> SkeletonBTreeMap?
>>>
>> 
>> Hello,
>> 
>> I did Library for GSoC 2009 and back then I was inexperienced with building 
>> and
>> engineering large software codebases (such as freenet and its plugin
>> ecosystem). There are many aspects of Library that I would do differently 
>> today
>> if I was re-doing that project.
>> 
>> A large part of Library focuses on serialisation of massively-large(1) data
>> structures, implemented *on top of* freenet's decentralised(2) storage. (1) 
>> and
>> (2) together is what makes the problem hard.[1] For my GSoC 2009 project, I
>> tried to solve this problem by implementing a load-on-demand local data
>> structure (SkeletonBTreeMap) that represents the *overall* data structure (a
>> B-tree) as it exists on freenet storage.
>> 
>> By contrast, massively scalable distributed systems such as Bigtable, and 
>> even
>> the underlying freenet DHT storage system, never expose the *overall* data
>> structure to the clients of those systems - instead they allow piece-by-piece
>> access, e.g. by row, or by key, and the client never sees the top-level data
>> structure.
>> 
>> For Library's B-Tree, this is not feasible, because (due to the design of
>> freenet) we cannot offload computation (i.e. data structure book-keeping) 
>> onto
>> other nodes.[2] It was also not feasible to use freenet's decentralised 
>> storage
>> more directly, because it has certain properties (such as LRU cache) that are
>> not acceptable for a search index.
>> 
>> So. That was an overview of the abstract algorithmic issues surrounding the
>> design of Library. Please let me know if any part of what I just said is not
>> understandable. Every sentence makes an important theoretical point. If you 
>> do
>> not fully understand *any part*, ask me to clarify, otherwise I fear that you
>> may repeat the same mistakes that I did. This is not exactly a problem since
>> GSoC is partly about learning - but it would be sub-optimal for the project's
>> progress.
>> 
>> I'll hold off on answering the rest of your questions to give you a chance to
>> digest my previous answers. Understanding those will make it easier for you 
>> to
>> understand my answers to the next section - and you may even be able to 
>> figure
>> those answers out for yourself without me explaining it explicitly.
>> 
>> Also, if you give me some context on what you're trying to achieve, I can 
>> give
>> more specific advice.
>> 
>> Let me know how you get along, and good luck!
>> Ximin
>> 
>> [1] We are lucky that we don't have to further worry about security because 
>> the
>> underlying freenet storage allows us to restrict access to one single user.
>> [2] Perhaps one day, a system that supports fully homomorphic encryption will
>> allow this to happen.
>> 
>>> Second, I have some questions:
>>>
>>> 1. You disabled the "boolean internal_entries" inside the 
>>> classSkeletonBTreeMap and use option 2. I don't understand what do you mean 
>>> about a dummy serialiser that copies task.data to task.meta. What contains 
>>> task.data?  
>>>
>>> 2. What means deflate/inflate the node?
>>>
>>> 3. What is a GhostNode? I understood is a not desirable structure used to 
>>> contain some metadata or sth related with the serializer and needs to be 
>>> removed.
>>>
>>> If you can elaborate more about Library, besides the documentation already 
>>> published in the wiki, it will be of great help.
>>>
>>> Thanks in advance,
>>>
>>> leuchtkaefer
>>>
>>>
>>>
>>> _______________________________________________
>>> Devl mailing list
>>> [email protected]
>>> https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>> 
>> 
>_______________________________________________
>Devl mailing list
>[email protected]
>https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>
>

_______________________________________________
Devl mailing list
[email protected]
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] questions about Library for my GSoC project

Reply via email to