Due to the inherent nature of AI models, such citations are fundamentally impossible (at the very least, every four bytes of useful AI model weights would need tends of kilobytes of attributional metadata, good luck figuring out how to properly cite anything this way) and thus the ONLY sensible thing is to shut them out entirely for anything where citations matter.
An AI model does not (internally) use a database, tagged or otherwise, I don't know where this myth keeps coming from. The data is converted into vectors and translated into intensities (weights) and this is a lossy process. I'd strongly recommend reading up on their operational mechanisms; it's certainly interesting. Here's a high level crash course: There's no chat or active work session. Everything going into the AI is basically a single flat text file that's dictionary-compressed (and not further processed, i.e. tokenized), using a precomputed (token) dictionary. There's an array of vectors and the dictionary index for each token serves as the array index for the corresponding vector. This vector is then slightly modified based on the input token's position within the whole input context window. A bunch of matrix multiplication is applied to that. A dot product operation called "softmax" is used, resulting in an output array that's as wide as the token dictionary. This array is full of various floating-point values called "logits." Then, some sampling algorithm either grabs the one with the highest number and uses its index as the dictionary index for the next bit of text to be used as the next "prediction". Then, that newly predicted output token is added to the end of the input tokens for the next processing round, over and over, until an "end of sequence" token is emitted or the output limit is reached (which usually involves the model getting cut off mid-sentence, like The Sopranos.) And there you have it: explain an LLM badly. It's a big statistical engine that works word-by-word. Unfortunately, it cannot provide citations for where it gets each word due to how the matrix multiplication/softmax stuff is exploited. Incidentally, this means it doesn't "know" what it's "doing", or that it's "doing" anything at all: Tool calls are just output tagged with specific output tokens. The illusion of chat, the tool calls, and it all is a metadata language called ChatML. That's really all the LLM does, reads a ChatML file and adds a new stanza at the end. -- Kirn Gill II Mobile: +1 813-300-2330 <+18133002330> VoIP: +1 813-704-0420 <+18137040420> Email: [email protected] LinkedIn: http://www.linkedin.com/pub/kirn-gill/32/49a/9a6 On Wed, May 27, 2026 at 4:18 PM Louis Santillan via Freedos-devel < [email protected]> wrote: > I believe AI needs to be treated like a tool like any other search > engine or automation or compiler. If a contribution includes work > performed with AI, it needs to cite its sources academically (like > IEEE, ACM, or CSE style), and legally (for compliance with GPL, LGPL, > BSD, MIT, et al). Additionally, the AI contribution needs to attempt > to be reproducible and/or its production needs to be documented > (Model/version used, chat log, prompts, input artifacts, etc.) and > these records should be accompanied with the contribution as part of > the 'source'. > > I think, otherwise, we risk worthwhile contributions and opportunities > for contribution. This would be akin to "requiring" every that > produces a contribution for FreeDOS must use 'OpenWatcom C' or 'hand > write binary machine code' or some other similarly silly requirement. > No one questions whether you use Borland or MS or OpenWatcom or GNU > assemblers or compilers today. Because they're tools that make > producing the work possible. > > > _______________________________________________ > Freedos-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/freedos-devel >
_______________________________________________ Freedos-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/freedos-devel
