Due to the inherent nature of AI models, such citations are fundamentally
impossible (at the very least, every four bytes of useful AI model weights
would need tends of kilobytes of attributional metadata, good luck figuring
out how to properly cite anything this way) and thus the ONLY sensible
thing is to shut them out entirely for anything where citations matter.

An AI model does not (internally) use a database, tagged or otherwise, I
don't know where this myth keeps coming from. The data is converted into
vectors and translated into intensities (weights) and this is a lossy
process.

I'd strongly recommend reading up on their operational mechanisms; it's
certainly interesting.

Here's a high level crash course:

There's no chat or active work session. Everything going into the AI is
basically a single flat text file that's dictionary-compressed (and not
further processed, i.e. tokenized), using a precomputed (token) dictionary.
There's an array of vectors and the dictionary index for each token serves
as the array index for the corresponding vector. This vector is then
slightly modified based on the input token's position within the whole
input context window. A bunch of matrix multiplication is applied to that.
A dot product operation called "softmax" is used, resulting in an output
array that's as wide as the token dictionary. This array is full of various
floating-point values called "logits." Then, some sampling algorithm either
grabs the one with the highest number and uses its index as the dictionary
index for the next bit of text to be used as the next "prediction". Then,
that newly predicted output token is added to the end of the input tokens
for the next processing round, over and over, until an "end of sequence"
token is emitted or the output limit is reached (which usually involves the
model getting cut off mid-sentence, like The Sopranos.)

And there you have it: explain an LLM badly. It's a big statistical engine
that works word-by-word. Unfortunately, it cannot provide citations for
where it gets each word due to how the matrix multiplication/softmax stuff
is exploited.

Incidentally, this means it doesn't "know" what it's "doing", or that it's
"doing" anything at all: Tool calls are just output tagged with specific
output tokens. The illusion of chat, the tool calls, and it all is a
metadata language called ChatML. That's really all the LLM does, reads a
ChatML file and adds a new stanza at the end.

--
Kirn Gill II
Mobile: +1 813-300-2330 <+18133002330>
VoIP: +1 813-704-0420 <+18137040420>
Email: [email protected]
LinkedIn: http://www.linkedin.com/pub/kirn-gill/32/49a/9a6


On Wed, May 27, 2026 at 4:18 PM Louis Santillan via Freedos-devel <
[email protected]> wrote:

> I believe AI needs to be treated like a tool like any other search
> engine or automation or compiler.  If a contribution includes work
> performed with AI, it needs to cite its sources academically (like
> IEEE, ACM, or CSE style), and legally (for compliance with GPL, LGPL,
> BSD, MIT, et al).  Additionally, the AI contribution needs to attempt
> to be reproducible and/or its production needs to be documented
> (Model/version used, chat log, prompts, input artifacts, etc.) and
> these records should be accompanied with the contribution as part of
> the 'source'.
>
> I think, otherwise, we risk worthwhile contributions and opportunities
> for contribution.  This would be akin to "requiring" every that
> produces a contribution for FreeDOS must use 'OpenWatcom C' or 'hand
> write binary machine code' or some other similarly silly requirement.
> No one questions whether you use Borland or MS or OpenWatcom or GNU
> assemblers or compilers today.  Because they're tools that make
> producing the work possible.
>
>
> _______________________________________________
> Freedos-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/freedos-devel
>
_______________________________________________
Freedos-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to