The doc only describes a technique, whereas it could mention some use
cases and give more info to developers.  Thus it doesn't seem to me to
be ready.  I propose some thoughts that may lead to additional info
for marf-redaction, if the WG deems worth to expand on them:

*Redacting the header*
Section 2 bullet 2 mentions "local-parts of email addresses", which is
fine.  However, the spec puts it within parentheses and prefixed by
"such that", thereby allowing unwanted mangling.

Some email addresses probably should never be redacted, e.g. From: and
Return-Path:, except for VERP.  Recipients addresses may appear in
To:, Cc:, the "for" clause of Received:, Delivered-To:, Envelope-To:,
X-Envelope-To:, X-Rcpt-To:, X-Original-To:, and the like.  Would it
make sense to attempt a possibly comprehensive list, as a guide to
developers?  In addition, any of these fields that was added locally
could as well be removed altogether.

By redacting To: or Cc: a reporter most likely breaks any DKIM
signature.  That might prevent the report from being accepted, so it
should be mentioned.  Workarounds are probably different for FBLs than
for reports submitted by general public.

*Redacting the body*
I've seen some mailing list software obscuring email addresses in the
body, but found no guidance about this.  People send passwords and
credit card numbers via email, and there is no standard to annotate
that these are sensible strings --a lost cause.

At any rate, when the body is covered by a signature, the same concern
as above arises.

*What to redact*
The reporting-discovery draft has a "Privacy considerations" section
saying that messages containing sensible data must not be reported as
spam.  Messages reported as spam are considered public.  (That section
might be moved to a BCP)  This concept may be a means to convey that
the already-abused recipient addresses are the only piece of data that
deserve redaction.

Jacqui Caren recommends "redaction of all identifying marks when
dealing with a spamtrap of obvious spam", in a scenario where
redaction is "based upon the anti-spam score the orginal message gets
and what level of trust you place in the MSP".
  http://www.ietf.org/mail-archive/web/marf/current/msg01048.html

Probably two or more honeypots are required to identify by difference
what parts of the messages contain varying information that can
potentially betray the destination address.

OTOH, users can inadvertently report as spam legitimate messages that
contain other kinds of sensible data.  Asking "Are you sure?" in the
GUI won't probably help much.

*Where is the pristine message*?
It has been mentioned that reports may need to be transmitted to LEAs.
 In some cases, a judge may order the disclosure of redacted data.
Can we use Message-Id: or similar field to locate it?  If we can, we'd
better suggest to never redact that field.

If transmission to LEAs crosses jurisdictional boundaries, it may be
useful to tell which country/state is the pristine data located in.

-- 
_______________________________________________
marf mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/marf

Reply via email to