Re: [freenet-dev] Sadao has found the solution to spam-proof forums/search!

Matthew Toseland Wed, 10 Oct 2012 15:24:50 -0700

On Wed, Oct 10, 2012 at 11:22:32PM +0100, Matthew Toseland wrote:
> On Wed, Oct 10, 2012 at 07:51:45PM +0100, Matthew Toseland wrote:
> > Sadao@JXXNvLaHdNMysx7GmY5~L4aCoMuQV85oJM9OIqhkTR8 wrote :
> > > 
> > > Let's consider the Frost message system and see what exactly it lacks to
> > > be spam protected. Every board has one SSK key pair. The public key of
> > > the board is normally known by everyone and used to download the
> > > messages in the very efficient manner. The private key of the board is
> > > used to post the messages by anyone who knows it. The moderator of the
> > > board gives it to some persons on individual basis. If one of those
> > > trusted persons begins to spam the board, it's trivially to block him by
> > > changing the board SSK key pair (manually or automatically) and giving
> > > the new private key to the same set of trusted persons except the
> > > spammer. There is only one problem: how to determine who exactly is 
> > > spammer?
> > > 
> > > This problem can't be solved by means of SSK keys only. If two persons
> > > know a private SSK key and one of them is a spammer, it's impossible to
> > > determine which of them he is. Therefore I propose to implement a new
> > > freenet key type (personalized SSK key), which is based on SSK but has
> > > two additional features:
> > > 
> > > 1. While the personalized key may be publicly known, it can be used for
> > > inserts only by the person who it was issued for.
> > > 2. When the data is downloaded from the personalized key, it must be
> > > possible to determine who was the inserter (more precisely, which of the
> > > issued personalized keys was used for the insert).
> > > 
> > > That's how it should work. The moderator of the board generates a base
> > > SSK key pair. Then he generates a number of personalized keys (one for
> > > every user who can post to the board) derived from the base private SSK
> > > key and the public key of the user. He publishes the list of all
> > > personalized keys in some way. Every user uses his own personalized key
> > > to insert the messages, but all messages are going to the same place as
> > > if they would be inserted with the base SSK private key. I.e. in order
> > > to download messages, all users still use the single base SSK public key.
> > > 
> > > Additionally, the moderator (or maybe everyone) can see which
> > > personalized key was used to insert every message. If he notices that
> > > someone begins to post spam and wants to block him, he simply generates
> > > a new base SSK key pair and new personalized keys for all users except
> > > the spammer. And the spammer can no longer post.
> > > 
> > > As you can see, in order to have the board working, there is no need for
> > > the moderator to always be online. The moderator only adds and removes
> > > the users from the write list, but as soon as the user was added, he can
> > > read and post to the board even when the moderator is on vacation. Yes,
> > > there is still the problem of identity introduction, but it's not so
> > > important. And it has been already solved in some way in FMS, and I have
> > > another possible solution for it. But I don't think that it's time to
> > > talk about this particular problem now.
> > > 
> > > P.S. I can describe the algorithm the node should use to deal with new
> > > key type in order to show that it's possible. But I have some
> > > unconfirmed assumptions on how SSK keys work in freenet now and I'd like
> > > toad (or someone else who knows) confirm them first:
> > > 
> > > 1. The routing key for SSK inserts is always derived from the public SSK
> > > key (not from the private key). The private SSK key never leaves the
> > > local node and is used only to make some signature in order to prove
> > > that the inserter owns the private SSK key.
> > 
> > Yes. The routing key is derived from the document name and the hash of the 
> > public key. The private key is only known by the client, and only 
> > temporarily by the client's node.
> > 
> > > 2. When the SSK block is about to be stored in the datastore of the
> > > remote node, the remote node checks if the block has correct signature,
> > > i.e. the inserter indeed owns the private SSK key. Otherwise the SSK 
> > > block is dropped. This way the network guarantees that only the key 
> > > owner is allowed to insert the data to that key.
> > 
> > Yes.
> > > 
> > > Is this correct?
> > 
> > Yes, looks like.
> > 
> > You've got it, inspiration, woah!
> > 
> > I apologise for spoiling what I'm sure you were going to trickle over a 
> > longer period, but I think we should build the basic, generic tools ASAP. 
> > Something similar was actually proposed *way back*, pre-0.5, but its 
> > usefulness wasn't recognised at the time, probably because we didn't have 
> > the immediate problem of chat tools. I should have seen it but haven't been 
> > sufficiently immersed in cypherpunk...
> > 
> > BASIC MODERATION:
> > 
> > PSK: (basic personalised SSK)
> > - Public key A (moderator's key)
> > - Public key B (poster's key)
> > - A signature on key B by key A.
> > - A signature on the whole message including the keys by B.
> > 
> > Routing:
> > PSK@<hash of A's public key>,<document crypto key>,<crypto 
> > settings>/<document name>
> > 
> > MULTIPLE MODERATORS:
> > 
> > GPK: (general purpose SSK)
> > - Full group identity: The full group identity is a list of public keys. 
> > The group identity is a hash over a group of public keys.
> > - Posting rule: Some function, written in a non-turing-complete language 
> > (with cryptographic operations), to decide whether a message is allowed. 
> > Limited runtime. Can only refer to the signing data and the group identity.
> > - Signing data which conforms to the posting rule, with a length limit.
> > - The payload data, with whatever encryption is appropriate.
> > 
> > I'm not sure whether the payload data and signing data should be a single 
> > entity. We don't want to allow operations so heavy that they constitute a 
> > CPU DoS...
> > 
> > The URI would incorporate all of the above:
> > PGK@<group identity hash>,<posting rule hash>,<crypto key>,<crypto 
> > config>/<document name>
> > 
> > This is very similar to how Bitcoin works.
> > 
> > The main operations we would implement with the above:
> > - Posting to a board. The posting rule may vary depending on the board, but 
> > a typical one would be "contains a pubkey signed by a moderator".
> > - Board maintenance. Changing the moderators list could have formal 
> > cryptographic requirements (e.g. some specified majority). Since the 
> > posting rule is part of the URI, we can simply poll the key that 
> > corresponds to the "change group" posting rule. And since the posting rule 
> > is configurable, we could have the same slot correspond to both changing 
> > the group and posting a message. However, what the application does with 
> > the data should be up to the application: Freenet should be able to return 
> > data from the posting rule, but the application decides whether to e.g. 
> > poll a different key.
> > 
> > The rules can be arbitrary, but the need to limit the amount of data 
> > validated means we won't see e.g. fully democratic scalable mechanisms. 
> > Basically what it gives you is efficient searching for content provided by 
> > anyone within a group, provided they accept the rules of the group; 
> > practically these will probably have to be hierarchical at least for now 
> > (e.g. the need for an invite, or for an invite from somebody invited by a 
> > top-level moderator, etc).
> > 
> > The constraint that we can't refer to other keys is IMHO absolute for 
> > performance and architecture reasons. Maybe we could have another key type 
> > that allowed inter-key dependencies (e.g. with passing on the dependencies 
> > at each hop to help validation), but there would need to be severe 
> > restrictions, since *every hop needs to be able to verify the key, both on 
> > insert and request*. Hence there would be fairly major load management 
> > implications too; e.g. if the number of keys involved is predictable, we'd 
> > ask for more than one from load management at each hop. So that's something 
> > for the future. And yes it probably would enable various forms of digicash.
> > 
> > For distributed search, this is usable as long as the group doesn't change 
> > too often: You don't want to insert 10,000 keywords and then have to do it 
> > all over again when the group changes. On the other hand if you increase 
> > the size of the bundles you insert, you reduce the number of reinserts 
> > (which can point to the same CHKs); there are useful tradeoffs.
> > 
> > Also, the above mechanism can allow for quite a bit of scalability, e.g. 
> > hierarchies of moderators.
> > 
> > Finally, it combines neatly with multiple-return SSKs, although I'm not 
> > sure how useful these are as they require all the results to have the same 
> > location, meaning it becomes a bottleneck and censorship point - you 
> > certainly don't want to use a single key for a video stream, for example!
> 
> Details on how to implement moderation:
> 
> There is a small number of primary moderators. They are included in the posts.
> There is a larger number of secondary moderators. They have a signature from 
> a primary moderator allowing them to introduce identities.
> 
> A normal post includes an identity, signed by a secondary moderator.
> 
> Newbies introduce (via hashcash or CAPTCHAs etc) to a secondary moderator. 
> The moderator signs their certificate so they can post.
> 
> When an identity starts posting spam, a moderator can post a control message 
> adding it to a blacklist. If the blacklist gets too big, we need a new 
> identity for the secondary moderator who introduced the spammers. The old 
> identity can again be blacklisted, but if that gets too big as well, we need 
> a new identity for the primary who introduced the secondary - and that means 
> a new queue key.
> 
> A single primary moderator may unilaterally change their key, thus 
> redirecting to a new queue, where all the other identities are the same and 
> theirs is changed. A single moderator of any kind can post to the blacklist. 
> And so on. So there is no voting requirement for routine maintenance.
> 
> When a secondary moderator gets a new key, all those who he's signed need new 
> signatures. They keep their old identities though. The moderator will upload 
> the new signatures for everyone (except the spammers they are removing). It 
> is only necessary to sign keys for active posters signed by that moderator, 
> so the number should be reasonably small.


And it's even better than that: For those who cry "centralised censorship": We 
can actually use it to accelerate the existing Web of Trust.

So first, lets look at what it takes to make Web Of Trust scale adequately.

For a Web Of Trust to scale, without providing gross censorship opportunities, 
we need to:
- Poll the regular posters (with sufficient trust to be visible) on the boards 
we subscribe to (in preference to everyone else).
- Use hints to see others.

(This is more or less what LCWoT does, for example)

But this allows for more censorship than is really necessary, and we may need 
to poll a wider range of identities just to get trust updates - in particular, 
we need to poll a wide range of identities so we can see newbies, who won't 
necessarily announce to regular posters. And even if we make them announce to 
regular posters when we join a board, they may not be accepted by the regular 
posters. And hints are a relatively slow way of getting updates anyway... As a 
matter of policy, a lot of WoT users want to see all newbies posting to a forum.

So how to achieve "I want to be able to see all newbies on my forums" and good 
performance?
- Poll a hashcash-protected board-specific announcement queue. I.e. the 
hashcash puzzle is to find X such that the last k bits of H(X) = the last N 
bits of the nth hash of the board name. Similar things could be implemented 
with bitcoins (e.g. payment to a non-existent account derived from the hash). 
But unfortunately because it has to be predictable based on the board name, and 
isn't directed at any specific individual, we can't use CAPTCHAs. And hashcash 
has a lot of problems, notably it's much cheaper for an attacker than for a 
legitimate user, especially if they have a slow computer. Bitcoin of course has 
other problems.

I.e. context-specific announcement, as digger3 called it.

Next, how can we improve on this with GPKs (PGKs or whatever):

The basic weakness of GPK-based forums is that they are centralised, and 
subject potentially to moderation cartels, flamewars etc. As digger3 pointed 
out, locally you can't easily change your trust configuration, except by 
following different groups. (Which might be a single board in UI terms)

However, we can create GPK-based forums opportunistically and use them to 
virtually eliminate polling:

Hence, a WoT identity creates a GPK-queue, publishing its public key (different 
to its main public key in case it needs to change it). Anyone whose pubkey is 
signed by the WoT identity's GPK pubkey can post to this queue, as well as the 
WoT identity itself. It publishes signatures with this key for those other WoT 
identities who have visible trust levels. They can now post to the WoT 
identity's queue. These WoT identities will then post on a limited number of 
GPK-queues (see below), in addition to their main message queues. They publish 
which GPK-queues they are posting to (which must be visible), both on their own 
identity/main queue and on the GPK-queues, and they post another message if 
they stop posting to said queue. While they are using a GPK-queue they 
guarantee to try to post everything. Which means if we are polling a GPK-queue, 
we don't need to poll the main queues for any of the identities currently using 
it. So we only need to poll the main queues for identities not in a GPK-queue 
we currently poll.

When the trust levels change, we don't necessarily change our GPK-queue 
immediately. But after some threshold (e.g. in proportion of messages, or when 
there is an imminent spam attack), the "owner" creates a new GPK-queue. As with 
the non-WoT proposal, there are two stages, based on available space:
- Same key, with a blacklist. This is polling from a new location, but does not 
require that signatures are republished.
- New key, with signatures republished. A file is attached containing the new 
signatures for all the active posters who are still visible and have posted 
recently. They can join the new queue or not.

TO BE DETERMINED: If GPKs' payload is big, we post whole messages, and use 
message IDs/hashes to avoid duplication; if they are small, we just post 
pointers to their own message queues, so that they are fetched when there are 
messages to be got, but not polled.

If the list of identities signed by a specific GPK-queue owner grows too large, 
we can get problems when the key is changed. So there will simply be an 
arbitrary limit. If we pass a threshold, we change keys regardless of spam. 
When we create a new key, we may sign a different set of posters, based on 
trust and who has posted recently.

Obviously, only those who publish their trust lists can be GPK-queue owners. 
Which should not be a problem in practice; non-trust-list-publishers can get on 
a queue.

signature.asc
Description: Digital signature

_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Sadao has found the solution to spam-proof forums/search!

Reply via email to