Thank you for your reply and for sharing your perspective. I would like to clarify one point, because I may not have expressed myself clearly.
My concern is not about having AI “read” or analyze personal data as such. I fully understand that this can itself create additional GDPR and ethical risks. The point I was trying to raise comes more from an organizational angle. Given that there are currently no dedicated people in a GDPR-focused role, my worry is that privacy-related work may end up being purely reactive, with someone having to act as a “firefighter” on top of their main responsibilities. I was thinking about whether there could be more proactive approaches to data minimization, so that fewer problematic records exist in the first place. I am not claiming that my idea is the right solution, nor that Debian should use AI for this. I only wanted to express a concern about privacy, which I consider a very important value in Debian, and to share a possible angle for discussion. I also noticed that there is a debian-ai mailing list, and since I am new to Debian mailing lists, it is possible that this was not the most appropriate list to bring up this idea. If so, I apologize for the noise and appreciate the guidance. Thank you for taking the time to reply. Best regards, pipo El mié, 7 ene 2026 a las 14:11, Bart Martens (<[email protected]>) escribió: > On Wed, Jan 07, 2026 at 01:33:55AM -0300, pedro vezzosi wrote: > > Hello, > > > > I would like to share a conceptual idea for discussion, not a concrete > > implementation proposal. > > > > One of the current challenges for large and long-lived projects like > Debian > > is the accumulation of historical logs, archives, and public records that > > may contain personal data (IPs, emails, names), especially for oldstable > > and EOL releases. > > > > My idea is a layered approach to data minimization: > > > > 1. > > > > Strict retention periods for raw logs (for example 30–90 days). > > 2. > > > > Automatic sanitization and anonymization of historical public records. > > 3. > > > > Use of an AI-assisted classification step (human-in-the-loop), where: > > I would rather make that: "protect personal data from artificial > intelligence", > so the opposite of AI-assisted classification of personal data. Frankly, we > should start erasing personal data before we no longer can. > > > - > > > > Clear personal data is anonymized automatically. > > - > > > > Ambiguous cases are isolated for human review. > > 4. > > > > Preservation of technical knowledge via summarized, signed incident > > records, instead of keeping large volumes of raw personal data. > > > > The goal would be to reduce GDPR exposure while keeping technical value, > > without rewriting history or removing useful information. > > > > I am not proposing to implement this myself, only offering an idea that > > could be discussed or explored in the future. > > > > Thank you for your time. > > > > Best regards, > > pipo > > -- >

