Re: Idea: Reducing GDPR risk via automated log and data minimization

pedro vezzosi Wed, 07 Jan 2026 10:16:14 -0800

Thank you for your reply and for sharing your perspective.

I would like to clarify one point, because I may not have expressed myself
clearly.

My concern is not about having AI “read” or analyze personal data as such.
I fully understand that this can itself create additional GDPR and ethical
risks. The point I was trying to raise comes more from an organizational
angle.

Given that there are currently no dedicated people in a GDPR-focused role,
my worry is that privacy-related work may end up being purely reactive,
with someone having to act as a “firefighter” on top of their main
responsibilities. I was thinking about whether there could be more
proactive approaches to data minimization, so that fewer problematic
records exist in the first place.

I am not claiming that my idea is the right solution, nor that Debian
should use AI for this. I only wanted to express a concern about privacy,
which I consider a very important value in Debian, and to share a possible
angle for discussion.

I also noticed that there is a debian-ai mailing list, and since I am new
to Debian mailing lists, it is possible that this was not the most
appropriate list to bring up this idea. If so, I apologize for the noise
and appreciate the guidance.

Thank you for taking the time to reply.

Best regards,
pipo

El mié, 7 ene 2026 a las 14:11, Bart Martens (<[email protected]>) escribió:

> On Wed, Jan 07, 2026 at 01:33:55AM -0300, pedro vezzosi wrote:
> > Hello,
> >
> > I would like to share a conceptual idea for discussion, not a concrete
> > implementation proposal.
> >
> > One of the current challenges for large and long-lived projects like
> Debian
> > is the accumulation of historical logs, archives, and public records that
> > may contain personal data (IPs, emails, names), especially for oldstable
> > and EOL releases.
> >
> > My idea is a layered approach to data minimization:
> >
> >    1.
> >
> >    Strict retention periods for raw logs (for example 30–90 days).
> >    2.
> >
> >    Automatic sanitization and anonymization of historical public records.
> >    3.
> >
> >    Use of an AI-assisted classification step (human-in-the-loop), where:
>
> I would rather make that: "protect personal data from artificial
> intelligence",
> so the opposite of AI-assisted classification of personal data. Frankly, we
> should start erasing personal data before we no longer can.
>
> >    -
> >
> >       Clear personal data is anonymized automatically.
> >       -
> >
> >       Ambiguous cases are isolated for human review.
> >       4.
> >
> >    Preservation of technical knowledge via summarized, signed incident
> >    records, instead of keeping large volumes of raw personal data.
> >
> > The goal would be to reduce GDPR exposure while keeping technical value,
> > without rewriting history or removing useful information.
> >
> > I am not proposing to implement this myself, only offering an idea that
> > could be discussed or explored in the future.
> >
> > Thank you for your time.
> >
> > Best regards,
> > pipo
>
> --
>

Re: Idea: Reducing GDPR risk via automated log and data minimization

Reply via email to