Jonathan, First of, wow, thank you SO much for taking on this task!
Option 1 looks really good to me. One thing, however: we’re starting to get into demographic analysis, so having a zipcode in the borrowers_anonymized table would be hugely beneficial. We’re generating choropleth maps, and borrower.zipcode provides just the right level of detail for our needs. I don’t know how well that would mesh with the overarching GDPR requirements, however (my guess: not well). Here’s a sample map that we can get right now; however, this requires us to pull card numbers, then extract the borrowers.zipcode, and then report the data out without the card number. Painful. Again, thanks for taking on this task! Aaron -- Aaron Sakovich Internet and Technology Services Manager Huntsville-Madison County Public Library 915 Monroe Street | Huntsville, Alabama 35801 | https://hmcpl.org/ > On Nov 21, 2019, at 10:13, Jonathan Druart > <[email protected]> wrote: > > Hello everybody, > > I have been contracted by KohaLa to work on some GDPR requirements. > The main idea is to "anonymize" patron's data but letting the library > access the transactions' statistics. > > I am going to present you what I am planning to implement, in order to > collect ideas and answers. > > There are the following steps I have in mind: > 1. Pseudonymization [1] of patron's data > 2. Improve deletion of patron related date (tables statistics, > old_reserves, deletedborrowers) > 3. Add the ability to remove data that have been pseudonymized > > I see 2 ways to achieve point 1: > * We create 2 tables, 1 for the patrons, 1 for the transactions. > - borrowers_anonymized will contain: hash_id, has_cardnumber, > branchcode, creation_date, categorycode, bsort1, bsort2, > [borrower_attributes] > - transaction_anonymized will contain: hash_id, transaction_type, > branchcode, itemnumber, holdingbranch, location, itemcallnumber, > itemtype, timestamp > > hash_id will be generated using the borrowernumber and a key (that > will be stored on the server, path in koha-conf) > > Pros: Easier to understand and manipulate as it follows existing structure. > We track patron's modifications (this is the most important part) > Cons: tech part: new config, a new path have to be created (minor) > > * We create only 1 table, (nosql-like). It will contain the same data > as previously, without the hash_id > > Pros: No new config. Data are never updated and we have the values > when the transactions has been processed. > Cons: Data are not updated :) > > About borrower_attributes, the initial specification asks for 2 > attributes defined in a syspref. I think it should be configurable, > with a join table (Pro: more flexible, Con: SQL requests more complex) > > I think we should have the 2 tables and keep a link between the > anonymized_patrons and anonymized_transactions tables. > > What do you think? > I am going to start the implementation very soon in order to plan an > integration early in the 20.05 dev cycle. > > Regards, > Jonathan > > [1] https://en.wikipedia.org/wiki/Pseudonymization > _______________________________________________ > Koha mailing list http://koha-community.org > [email protected] > https://lists.katipo.co.nz/mailman/listinfo/koha _______________________________________________ Koha mailing list http://koha-community.org [email protected] https://lists.katipo.co.nz/mailman/listinfo/koha

