Re: [Int-area] WG adoption call: Availability of Information in Criminal Investigations Involving Large-Scale IP Address Sharing Technologies

Amelia Andersdotter Wed, 25 Apr 2018 09:35:19 -0700

On 2018-04-25 13:16, Povl H. Pedersen wrote:
> I would keep full IP address + port info in my firewall log. Separate
> from the webserver log. This to help the webguys not abusing collected
> data. 
> Having talked to the webguys, they use the logfiles in daily
> operations, and they see them as necesary to provide continous
> delivery of the services to the end user.That is another obligation we
> have.


I'm assuming that subscribers (end users, data subjects, individuals)
are informed of your web analytics practises and consent to it somehow.
Web analytics people use a lot of data for a lot of things that aren't
*technically* required for a service to work. That's why Section 6.2
User involvement of RFC6973 and "It is RECOMMENDED that deviations from
the above practices are carefully documented and communicated to
subscribers," in my draft.

> Our legal department actually suggested we keep logs for 5 years, as
> some data must be kept that long.
>

I find that difficult to believe. Under accounting/tax law you would
sometimes have to keep track of sales, money received, etc. but surely
your company doesn't do that through IP and source port logs from
Internet-Facing Servers/firewalls(?)

> The big privacy issue here is more about abuse and losing the data
> (move them away from the internet facing server within 3 days would be
> a good recommendation). This must be controlled by internal company
> rules. Not this RFC that says we must cripple data after 3 days. And 3
> days is a stupid limit if there is a longer weekened/holidays etc.
> Easter is an example, Thursday to monday are non-working days. That is
> 5 days + the extra. So the 3 days should be 6 days without even
> accounting for holidays.
>

Earlier you said 30 days, in case someone needs to be on their entire
holidays (three weeks) and another week to work through backlogs,
because they start processing the potential identity of a perpetrator in
a security-related incident from 30 days prior. With DDOS as an example:
if you were DDOS:ed 30 days ago, and only now you're going over the logs
from that event, you should probably just get better at incident
response - so that you could freeze the specific logs relating to that
event in the three-day period that the logs were recommended to be
stored (assuming there's some way of initiating retention of data for a
longer period than three days if necessary, but not as a default).

/a

>
> On Wed, Apr 25, 2018 at 11:22 AM, <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Re-,
>
>      
>
>     Please see inline.
>
>      
>
>     Cheers,
>
>     Med
>
>      
>
>     *De :*Povl H. Pedersen [mailto:[email protected]
>     <mailto:[email protected]>]
>     *Envoyé :* mercredi 25 avril 2018 11:05
>     *À :* BOUCADAIR Mohamed IMT/OLN
>     *Cc :* [email protected] <mailto:[email protected]>
>     *Objet :* Re: [Int-area] WG adoption call: Availability of
>     Information in Criminal Investigations Involving Large-Scale IP
>     Address Sharing Technologies
>
>      
>
>     If we are at say a /20 or /22 (that is 2000-8000 possible IP
>     addresses), and we have the source port, then the ISP should be
>     able to see which of these addresses has the given source port to
>     our destination IP and port.
>
>     [Med] The assumption about destination IP at the provider side is
>     broken. Further, logging destination IP address is not
>     recommended. RFC6888 says the following:
>
>      
>
>        REQ-12: A CGN SHOULD NOT log destination addresses or ports unless
>
>           required to do so for administrative reasons.
>
>      
>
>        Justification:  Destination logging at the CGN creates privacy
>
>           issues.
>
>      
>
>     Note also that recent advances in optimizing logs at CGNs (e.g.
>     port set assignment, deterministic NAT) conflicts with maintaining
>     a track of the destination IP address.  
>
>      
>
>     Also, there are stateless address sharing techniques which does
>     not even involve a CGN (MAP-E, MAP-T, …). The information about
>     destination IP address per new session is not an option.
>
>      
>
>      
>
>     With a timestamp, the risk of collision is low. And the police can
>     at least minimize number of suspects.
>
>      
>
>     [Med] If the destination IP address is not logged at the provider
>     side (which is likely), the collision probability of your proposal
>     may be bigger for deployments which use a low address sharing
>     ratio (1:2, 1:4).
>
>     CGN does not break GeoIP. It still allows us to pinpoint the ISP,
>     but might not allow us to pinpoint the user any closer than the
>     breakout point.
>
>     [Med] This is exactly what we meant by broken GeoIP in
>     https://tools.ietf.org/html/rfc6269#section-7
>     <https://tools.ietf.org/html/rfc6269#section-7>
>
>      
>
>     If we have an ISP, with CGN, and the police can come with a
>     timestamp, and source port, and a destination ip/port, the carrier
>     can likely determine the physical person. If he has say 255
>     possible external IP addresses in use, the chance of the same
>     source port to the same destination across these is small.
>
>
>     With address sharing, we can't point to one physical person.
>
>     [Med] OK.
>
>     I have a dynamic public IP at home (changes rarely). It is
>     diificult to pinpoint anything to me, my wife or my children. Or
>     any user of my open WiFi SSID. From a legal point of view, this is
>     impossible.
>
>     [Med] OK..
>
>     But, the privacy protection in GDPR should protect the 20 y.o. old
>     having a fixed public IP, living alone. And here a fixed IP is
>     enough for an ISP to locate a person (or rather a machine) with
>     som certainty.
>
>     [Med] ISPs operating fixed networks can locate their
>     customers/subscribers whatever scheme used for assigning IP
>     addresses. The identification is based on the line, not IP addresses.
>
>     I think this is all a tradeoff between protecting individuals,
>     while not completely giving up investigative tools - At least to
>     do investigation with some statistical probability. And since you
>     do not know which addresses are used by CGN, you can't handle them
>     different than other IPs.
>
>     [Med] Given that you stated above that it is difficult to track an
>     individual user based on the IP address, then what is the value of
>     complicating the investigation by not recording the full IP
>     address + port (for this specific investigation purpose)?  
>
>
>
>     Having the full firewall logs as a separate supplement to
>     webserver logs will allow you (in many cases) to use the truncated
>     source IP + port to find one or a few possible IP addresses. Since
>     you need data from 2 systems, they are Pseudonymized, and our
>     legal department would agree it is then acceptable.
>
>     Today we keep logs for 18-24 months, and most police
>     investigations comes to us 12-14 months after the crime asking for
>     more details. Sometimes for cases we did not know existed. We are
>     a PCI audited level 1 retailer with a few web stores. 
>
>     We do not have people at work every day to look in logs, so the 3
>     days retention is impossible. It may take weeks for us to discover
>     things. If 3 days is to cover the weekend (no 24/7), it should
>     instead be 30 days, as key employees might have the normal 21 days
>     of holiday and a week to catch up. Smaller companies might not
>     have overlapping staff skills.
>
>      
>
>     On Wed, Apr 25, 2018 at 10:20 AM, <[email protected]
>     <mailto:[email protected]>> wrote:
>
>     Dear Povl,
>
>      
>
>     Thank you for sharing your thoughts.
>
>      
>
>     I have one comment and two clarification questions:
>
>     - Wouldn’t logging based /20-/22 nullify the interest to log
>     source ports for investigations? Multiple subscribers may be
>     assigned the same port in the /20 or /22 range.
>
>     - GeoIP (whatever that means) is broken when CGNs are in use.
>
>           - How and under which conditions an IP address + port can be
>     used to point to “ONE physical person” especially when address
>     sharing is in use?
>
>      
>
>     Cheers,
>
>     Med
>
>      
>
>     *De :*Int-area [mailto:[email protected]
>     <mailto:[email protected]>] *De la part de* Povl H. Pedersen
>     *Envoyé :* mercredi 25 avril 2018 09:55
>     *À :* [email protected] <mailto:[email protected]>
>     *Objet :* Re: [Int-area] WG adoption call: Availability of
>     Information in Criminal Investigations Involving Large-Scale IP
>     Address Sharing Technologies
>
>      
>
>     Where I work, we keep the firewall logs with port numbers
>     completely separate from the webserver logs.
>
>     Looking at article 25 of GDPR, it is clear that IP addresses are
>     pseudonymized data in the firewall logs, as there are only 2 ways
>     to connect the IP address to a physical person.
>     1. Court order to ISP etc. 
>     2. have the web people look up the IP address in their systrem,
>     trace requests, and see if they can associate it with a known user
>     identity.
>
>     So firewall logs, unless the web people have access to them, are
>     pseudonymized data. So secure by design (article 25). And we can
>     keep them for statistics, or investigation purposes.
>
>     Now, the question then is, how can we keep enough data in the
>     webserver etc log to be able to to actually do enough
>     investigation ? A /16 shortening was suggested. I think this is
>     too large gruping. Can not even be used for country/city
>     statistical purposes. But of course we can enrich data with that
>     from the likes of MaxMind, when throwing away trailing bits.
>
>     I think we need a minimum /20-/22 and source port in the logs to,
>     with some degree of confidence, go from events in the webserver
>     logs back to the firewall log to have necesary information for
>     investigation/authorities. If we have a /20-/22 and GeoIP data, we
>     might have a few candiates. Then this is good enough to ensure we
>     can not get back to ONE physical person.
>
>     I think, that updating RFC6302 might be a bit early, and we risk
>     that it has to be revised after the first court makes a decision.
>
>     If we keep RFC6302 as is, then companies can defend themself, by
>     saying they use best practise.
>
>     We have another obligation as dataowners/processors. We should
>     keep enough data to verify a suspected data breach, and judge the
>     impact. If I can not see if 10000 profiles was downloaded by the
>     same IP, or from 10000 different IPs (out of 65535), I might not
>     be able to fulfill some of the other requirements in GDPR.
>
>     I think the big question here is how the data is stored/processed,
>     and it must be governed by organizational measures (policies and
>     training). It would likely be illegal to use to logs to profile a
>     person.But there can be other interests allowing us to keep the
>     logs, disassociated from user profiles or other things that allows
>     us to map an IP to an individual.
>
>      
>
>
>
>
> _______________________________________________
> Int-area mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/int-area


-- 
Amelia Andersdotter
Technical Consultant, Digital Programme

ARTICLE19
www.article19.org

PGP: 3D5D B6CA B852 B988 055A 6A6F FEF1 C294 B4E8 0B55


_______________________________________________
Int-area mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/int-area

Re: [Int-area] WG adoption call: Availability of Information in Criminal Investigations Involving Large-Scale IP Address Sharing Technologies

Reply via email to