The “large” refers to the number of parameters used. A smaller large language model – a deep neural net -- start about 3 billion parameters, but larger ones like Claude 2 (the latest large language model of the company that wrote the paper Steve mentioned) have more than 130 billion parameters. Amazingly, it is possible using (rooms of) GPUs and other accelerators to optimize in this a space of this size. The billions of parameters come from the vocabulary size – the number of tokens that need to be discriminated, the many layers of transformers that are needed to capture the complexity of human and non-human languages (like DNA), and the context window size – how many paragraphs or pages the model is trained on at a time. A small language model might be suitable for understanding the geometries of chemicals, say.
From: Friam <friam-boun...@redfish.com> On Behalf Of Tom Johnson Sent: Saturday, October 7, 2023 2:38 PM To: The Friday Morning Applied Complexity Coffee Group <friam@redfish.com> Subject: Re: [FRIAM] Language Model Understanding Thanks for passing this along, Steve. I wish, however, the authors of this short piece would have included a definition of, in their usage, "Large Language Models" and "Small Language Models." Perhaps I can find those in the larger paper. Tom On Sat, Oct 7, 2023 at 12:34 PM Steve Smith <sasm...@swcp.com<mailto:sasm...@swcp.com>> wrote: This popular-press article came through my Google News feed recently which I thought might be useful to the Journalists/English-Majors on the list to help understand how LLMs work, etc. When I read it in detail (forwarded from my TS (TinyScreenPhone) on my LS (Large Screen Laptop)) I found it a bit more detailed and technical than I'd expected, but nevertheless rewarding and possibly offering some traction to Journalism/English majors as well as those with a larger investment in the CS/Math implied. Decomposing Language Models into Understandable Components <https://www.anthropic.com/index/decomposing-language-models-into-understandable-components> [https://efficient-manatee.transforms.svdcdn.com/production/images/Untitled-Artwork-11.png?w=2880&h=1620&auto=compress%2Cformat&fit=crop&dm=1696477668&s=d32264d5f5e32c79026b8e310e415c74] and the (more) technical paper behind the article https://transformer-circuits.pub/2023/monosemantic-features/index.html Despite having sent a few dogs into vaguely similar scuffles in my careen(r): Faceted Ontologies for Pre Incident Indicator Analysis <https://apps.dtic.mil/sti/tr/pdf/ADA588086.pdf> SpindleViz<https://www.ehu.eus/ccwintco/uploads/c/c6/HAIS2010_925.pdf> ... ... I admit to finding this both intriguing and well over my head on casual inspection... the (metaphorical?) keywords that drew me in most strongly included Superposition and Thought Vectors, though they are (nod to Glen) probably riddled (heaped, overflowing, bursting, bloated ... ) with excess meaning. https://gabgoh.github.io/ThoughtVectors/ This leads me (surprise!) to an open ended discursive series of thoughts probably better left for a separate posting (probably rendered in a semasiographic language like Heptapod B<https://en.wikipedia.org/wiki/Heptapod_languages#Orthography>). <must... stop... now... > - Steve -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/