Re: [gentoo-dev] [RFC] Anti-spam for goose

Alec Warner Thu, 21 May 2020 17:38:31 -0700

On Thu, May 21, 2020 at 1:13 PM Viktar Patotski <[email protected]>
wrote:


> Hi all,
>
> I believe that we are all have forgotten about Donald Knuth: Premature
> optimisation is the root of all evill.
>
> We don't have "spam" yet, but we are already trying to protect. There
> might be cases when some systems will be posting stats more often than we
> want, but probably that will not harm us. Or this will be done by our main
> users who runs 1kk of gentoo installations and this "spam"  will be
> actually valuable. Moreover, nobody forces us to treat info from 'goose' as
> first priority, so we are still able to select on which packages to work.
> In short: this topic is not so important yet, I think.
>

I raised a similar question on irc and the conclusion was that 'it is good
to have ideas' and I don't necessarily disagree there[0]. We cannot build a
foolproof system but some are feasible in some scenarios[1].

[0] Gentoo offers numerous no-login-required services; most of these are
read-only but they typically don't suffer from attacks; or at least, not
attacks that we need to respond to. The most obvious one of these is our
gentoo.org mail service which accepts unauthenticated email to gentoo.org.
Our anti-email-spam countermeasures are what I would call complex, but we
still employ broad measures when needed and the tradeoffs are similar to
the options for goose; e.g. if we are too broad we can block email from
large swaths of the internet.
[1] Bugzilla *has* recently been the target of spam attacks, it *has*
logins required (e.g. to create / modify bugs) and it has not stopped the
spammers from creating accounts. We have discussed different protections
for bugzilla, as it has different parameters. A basic bugzilla account
can't do all that much (you can't modify the bugs of others easily) and
spam posts are easily identified. This is to differentiate from goose where
the powers of each token are the same (submit report) and it may be
difficult to tell an abusive report from a real report.


> Viktar
>
>
> On Thu, May 21, 2020, 16:28 Jaco Kroon <[email protected]> wrote:
>
>> Hi Michał,
>>
>> On 2020/05/21 13:02, Michał Górny wrote:
>> > On Thu, 2020-05-21 at 12:45 +0200, Jaco Kroon wrote:
>> >> Even for v4, as an attacker ... well, as I'm sitting here right now
>> I've
>> >> got direct access to almost a /20 (4096 addresses).  I know a number of
>> >> people with larger scopes than that.  Use bot-nets and the scope goes
>> up
>> >> even more.
>> > See how unfair the world is!  You are filling your bathtub with IP
>> > addresses, and my ISP has taken mine only recently.
>> I must admit, I work for an ISP :$
>> >>>     Option 3: explicit CAPTCHA
>> >>>     ==========================
>> >>>     A traditional way of dealing with spam -- require every new system
>> >>>     identifier to be confirmed by solving a CAPTCHA (or a few
>> identifiers
>> >>>     for one CAPTCHA).
>> >>>
>> >>>     The advantage of this method is that it requires a real human work
>> >>>     to be
>> >>>     performed, effectively limiting the ability to submit spam.
>> >>>
>> >> Yea.  One would think.  CAPTCHAs are massively intrusive and in my
>> >> opinion more effort than they're worth.
>> >>
>> >> This may be beneficial to *generate* a token.  In other words - when
>> >> generating a token, that token needs to be registered by way of
>> capthca.
>> >>
>> >>>     Other ideas
>> >>>     ===========
>> >>>     Do you have any other ideas on how we could resolve this?
>> >>>
>> >> Generated token + hardware based hash.
>> > How are you going to verify that the hardware-based hash is real,
>> > and not just a random value created to circumvent the protection?
>>
>> So the generation of the hash is more to validate that it's still on the
>> same installation (ie, not a cloned token).  Sorry if that wasn't clear,
>> so trying to solve two possible problems in one go.
>>
>> >
>> >>   Rate limit the combination to 1/day.
>> >>
>> >> Don't use included results until it's been kept up to date for a
>> minimum
>> >> period.  Say updated at least 20 times 30 days.
>> > For privacy reasons, we don't correlate the results.  So this is
>> > impossible to implement.
>>
>> Ok, but a token cannot (unless we issue it based on an email based
>> account) be linked back to a specific user, so does it matter if we
>> associate uploads with a token?
>>
>> >> The downside here is that many machines are not powered up at least
>> once
>> >> a day to be able to perform that initial submission sequence.  So
>> >> perhaps it's a bit stringent.
>> > Exactly.  Even once a week is a bit risky but once a day is too narrow
>> > a period.
>> >
>> > To some degree, we could decide we don't care about exact numbers
>> > as much as some degree of weighed proportions.  This would mean that,
>> > say, people who submit daily get the count of 7, at the loss of people
>> > who don't run their machines that much.  It would effectively put more
>> > emphasis on more active users.  It's debatable whether this is desirable
>> > or not.
>> Decaying averages.  Simple to implement, don't need all historic data.
>> >
>> > Both the token and hardware hash can of course be tainted and is under
>> >> "attacker control".
>> > Exactly.  So it really looks like exercise for the sake of exercise.
>>
>> Unless tokens are *issued* as per the rest of my email you snipped
>> away.  Wherein I proposed an issuing of both anonymous and non-anonymous
>> tokens.
>>
>> Kind Regards,
>> Jaco
>>
>>
>>

Re: [gentoo-dev] [RFC] Anti-spam for goose

Reply via email to