Re: GDPR compliance best practices?

Philip Oakley Sun, 03 Jun 2018 15:29:03 -0700

From: "Peter Backes" <[email protected]>

On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote:

In most Git cases that legal/legitimate purpose is the copyright licence,
and/or corporate employment. That is, Jane wrote it, hence X has a legal
rights of use, and we need to have a record of that (Jane wrote it) as
evidence of that (I'm X, I can use it) right. That would mean that Jane
cannot just ask to have that record removed and expect it to be removed.


Re corporate employment:

For sure nobody would dare to quesion that a company has a right to
keep an internal record that Jane wrote it.

The issue is publishing that information. This is an entirely different
story.

It is here that Article 6 kicks in as to whether the 'organisation' canretain the data and continue to use it.

https://gdpr-info.eu/art-6-gdpr/
https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/
https://www.lawscot.org.uk/news-and-events/news/gdpr-legal-basis-and-why-it-matters/

For an open source project with an open source licence then an implict DCOapplies for the meta data. It is the legal basis for the the release.

If a corporate project has a closed source project, then yes, openpublishing of that personal data within a repo's meta data would beincorrect, even though the internal repo would be kept.


I already stressed that from the very beginning.

Re copyright license:

No, a copyright license does not provide a legitimization.

- copyright is about distributing the program, not about distributing
version control metadata.

It is specificaly about giving that right to copy by Jane Doe (but git givesno other information other than that supposedly globally unique 'authoremail'.


- Being named is a right, not an obligation of the author. Hence, if
the author doesn't want his name published, the company doesn't have
legitimate grounds based in copyright for doing it anyway, against his
or her will.

Git for Open Source is about open licencing by name. I'd agree that a closedcorporate licence stays closed, but not forgotten.

From a personal view, many folk want it to be that corporates (and open
source organisations) should hold no personal information with having
explicit permission that can then be withdrawn, with deletion to follow.
However that 'legal' clause does [generally] win.


Let's be honest: We do not know what legitimization exactly in each
specific case the git metadata is being distributed under.

We should know, already. A specific licence [or limit] should be in place.We don't really want to have to let a court decide ;-)


It may be copyright, it may be employment, but it may also be revocable
consent. This is, we cannot safely assume that no git user will ever
have to deal with a legitimate request based on the right to be
forgotten.

The law is never decided by technical means, unfortunately. Regular gitusers should have no issues - they just need to point their finger at theresponsible authority. (beware though, of the oneway trap door that theusers mistakes can become the problem for the responsible authority!)

In the git.git case (and linux.git) there is the DCO (to back up theGLP2)as an explicit requirement/certification that puts the information intothe
legal evidence category. IIUC almost all copyright ends up with a similar
evidentail trail for the meta data.


This makes things more complicated, not less. You have yet more meta
data to cope with, yet more opportunities to be bitten by the right to
be forgotten. Since I proposed a list of metadata where each entry can
be anonymized independently of each other, it would be able to deal
with this perfectly.

The DCO/GPL2 are the legitimate data record that recipients should have fortheir copy. There is no right to be forgotten at that point.

The more likely problem is if the content of the repo, rather than themeta
data, is subject to GDPR, and that could easily ruin any storage method.
Being able to mark an object as <Lost/Deleted> would help here(*).


My proposal supports any part of the commit, including the contents of
individual files, as eraseable, yet verifiable data.

Also remember that most EU legislation is 'intent' based, rather than
'letter of', for the style of legal arguments (which is where some of theUKBrexit misunderstandings come from), so it is more than possible to getintothe situation where an action is both mandated and illegal at the sametime,so plent of snake oil salesman continue to sell magic fixes according tothe
customers local biases.


This may be true. I am not trying to sell snake oil, however. To have
erasure and verifiability at the same time is a highly generic feature
that may be desirable to have for a multitude of reasons, including but
not limited to legal ones like GDPR and copyright violations.

I do not believe Git has anything to worry about that wasn't already an
issue.


Yes, but it definitely had and still does have something to worry about.

git should provide technical means to deal with this. I provided a
proposal based on anonymization that does not in any way have any
drawback compared to the status quo, except a slight increase in
metadata size and various degrees of backwards incompatibility,
depending on how it is implemented.

What do you think about my proposal as a solution for the problem?

I see the solution to be elsewhere, and that it is in some ways a strawmandiscussion: "if someone has the right to be forgotten, how do we delete themeta data", when that right (to delete the meta data in a properly licencerepo) does not exist.

That said, the problem of maintaining repo integrity when some objects mustbe deleted or re-written (because they had stored peronal info that theyshould not have), will require a little bit extra on the side.

But this is open source, so ideas, and code, will come forward that allowsthings like 'replaced commits' to be formally part of a repo and its leadingoid (or maybe it's an oid pair) will handle that. I'd guess that the commitwill have an extra line after the parents and tree lines that details (insome manner) the 'replaced' things, so that fsck still works, the oid iscomplete and thus the whole shebang can be verified.


You provide a lot of arguments about why it is not a necessity to have
this, but let's assume it is; is there any actual problem you see with
the proposal, except that someone would have to implement it?

It's the strawman problem. If it was a real 'real issue' then it would havealready shown up with companies clamouring to pay folk to fix our (git's)latest problem. But the haven't, so I think it's a much more balanced issue.

--
Philip

Re: GDPR compliance best practices?

Reply via email to