On Wed, Jan 07, 2026 at 01:33:55AM -0300, pedro vezzosi wrote: > Hello, > > I would like to share a conceptual idea for discussion, not a concrete > implementation proposal. > > One of the current challenges for large and long-lived projects like Debian > is the accumulation of historical logs, archives, and public records that > may contain personal data (IPs, emails, names), especially for oldstable > and EOL releases. > > My idea is a layered approach to data minimization: > > 1. > > Strict retention periods for raw logs (for example 30–90 days). > 2. > > Automatic sanitization and anonymization of historical public records. > 3. > > Use of an AI-assisted classification step (human-in-the-loop), where:
I would rather make that: "protect personal data from artificial intelligence", so the opposite of AI-assisted classification of personal data. Frankly, we should start erasing personal data before we no longer can. > - > > Clear personal data is anonymized automatically. > - > > Ambiguous cases are isolated for human review. > 4. > > Preservation of technical knowledge via summarized, signed incident > records, instead of keeping large volumes of raw personal data. > > The goal would be to reduce GDPR exposure while keeping technical value, > without rewriting history or removing useful information. > > I am not proposing to implement this myself, only offering an idea that > could be discussed or explored in the future. > > Thank you for your time. > > Best regards, > pipo --

