Re: [agi] Token Coloring - Simple architectural defense against prompt injection?

stefan.reich.maker.of.eye via AGI Fri, 02 Jan 2026 07:47:26 -0800

On Friday, 2 January 2026, at 1:23 AM, Matt Mahoney wrote:
> What happens if you tell the LLM to ignore the token flags? How could you 
> test an LLM to make sure this can't happen? Have you done any actual tests?
I haven't tested anything yet, just chatted with Claude about the idea...


> What happens if you tell the LLM to ignore the token flags?

Within the "green" tokens (data)? Nothing should happen. That's trained as part 
of the adversarial examples in the command following dataset.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T2faee51273b20a92-M8a8756cc8483648160df98ca
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Token Coloring - Simple architectural defense against prompt injection?

Reply via email to