Re: [FRIAM] Language Model Understanding

2023-10-07 Thread Steve Smith

Tom -

This is probably a little less relevant to the business of journalism 
than you deserve but this is a pretty good (IMO) fleshing out of some of 
the technical details of (L)LMs which might fill in some blanks:


   https://developers.google.com/machine-learning/resources/intro-llms

There must be (by now) some (yet) more relevant resources for 
Journalists (for example), not only to de-Jargonize a little but also to 
make it (yet more) relevant to your domain.


Here, straight from the /Horses' Mouth/ (GPT 3.5) itself ( I leave it to 
you to tell us how well these answers work for you?).  It is probably 
worth engaging with chatGPT yourself and asking more and more focused 
questions...   a good self-study in "prompt engineering"!


   http://chat.openai.com

***

   /Introduction to Modern Language Models for Veteran Journalists//
   //
   //In today's digital age, the field of journalism is undergoing a
   significant transformation, largely driven by advancements in
   technology. One key innovation that has had a profound impact on the
   industry is the development of modern language models, often
   abbreviated as LLMs. While you may not be a tech expert,
   understanding the basics of LLMs can be invaluable for journalists.
   So, let's break it down in simple terms.//
   //
   //**What Are Language Models?**//
   //
   //At its core, a language model is like a super-smart digital
   writer. It's a computer program trained to understand and generate
   human language. Think of it as a tool that helps computers
   understand and use words, sentences, and paragraphs just like humans
   do.//
   //
   //**The Power of Large Language Models (LLMs)**//
   //
   //Now, let's talk about Large Language Models (LLMs). These are the
   big players in the world of language models. They're like language
   supercomputers because they're trained on massive amounts of text
   from the internet. This means they've learned a lot about how people
   write and talk.//
   //
   //**How LLMs Help Journalists**//
   //
   //1. **Content Generation**: LLMs can generate written content, from
   news articles to summaries and even creative pieces. Journalists can
   use them to draft stories faster and get ideas flowing.//
   //
   //2. **Fact-Checking**: They're handy for fact-checking, helping you
   quickly verify information by scanning a vast database of knowledge.//
   //
   //3. **Summarization**: LLMs can take long, complex documents and
   create concise summaries, saving you time on research.//
   //
   //4. **Translation**: They can also translate text between
   languages, which can be incredibly useful for international reporting.//
   //
   //5. **Personalization**: Some news websites use LLMs to tailor
   content to individual readers, providing a more personalized news
   experience.//
   //
   //6. **Social Media Analysis**: LLMs can sift through social media
   data, helping journalists understand public sentiment and trends.//
   //
   //**Potential Concerns**//
   //
   //While LLMs are incredibly powerful, there are some concerns to
   keep in mind. They might unintentionally spread misinformation if
   not used carefully, and they can't replace human judgment or
   critical thinking.//
   //
   //**Conclusion**//
   //
   //In a nutshell, modern language models (LLMs) are like digital
   writing assistants that can help journalists write better, research
   faster, and adapt to the changing landscape of digital journalism.
   While they're powerful tools, it's important to use them
   responsibly, keeping in mind their strengths and limitations, as
   well as the ethical considerations of automated content creation in
   journalism.//
   /

Can you elaborate on some of the technical jargon?/
/

   /Certainly! Here are some common technical jargon related to Large
   Language Models (LLMs) explained in journalist-friendly terms://
   //
   //1. **Pretraining**: This is the initial phase where the LLM learns
   from a massive amount of text data on the internet. Think of it as
   the model absorbing a vast library of information to understand
   language better.//
   //
   //2. **Fine-tuning**: After pretraining, LLMs undergo fine-tuning to
   specialize them for specific tasks. It's like giving the model extra
   training in a particular subject, like finance or healthcare, to
   make it more useful for specialized reporting.//
   //
   //3. **Token**: In LLMs, text is broken down into smaller units
   called tokens. A token can be as short as one character or as long
   as one word. Knowing the number of tokens is important because it
   affects how much text the model can handle at once.//
   //
   //4. **Inference**: This term refers to the process of using a
   trained LLM to generate text or make predictions. Journalists use
   inference to create content, answer questions, or analyze data.//
   //
   //5. **Prompt**: A prompt is the input you provide to an LLM to get
   a response. It's 

Re: [FRIAM] Language Model Understanding

2023-10-07 Thread Tom Johnson
Thanks. Those are impressive numbers.
Tom

On Sat, Oct 7, 2023 at 4:11 PM Marcus Daniels  wrote:

> The “large” refers to the number of parameters used.  A smaller large
> language model – a deep neural net -- start about 3 billion parameters, but
> larger ones like Claude 2 (the latest large language model of the company
> that wrote the paper Steve mentioned) have more than 130 billion
> parameters.   Amazingly, it is possible using (rooms of) GPUs and other
> accelerators to optimize in this a space of this size.   The billions of
> parameters come from the vocabulary size – the number of tokens that need
> to be discriminated, the many layers of transformers that are needed to
> capture the complexity of human and non-human languages (like DNA), and the
> context window size – how many paragraphs or pages the model is trained on
> at a time.   A small language model might be suitable for understanding the
> geometries of chemicals, say.
>
>
>
> *From:* Friam  *On Behalf Of *Tom Johnson
> *Sent:* Saturday, October 7, 2023 2:38 PM
> *To:* The Friday Morning Applied Complexity Coffee Group <
> friam@redfish.com>
> *Subject:* Re: [FRIAM] Language Model Understanding
>
>
>
> Thanks for passing this along, Steve. I wish, however, the authors of this
> short piece would have included a definition of, in their usage, "Large
> Language Models" and "Small Language Models."  Perhaps I can find those in
> the larger paper.
>
> Tom
>
>
>
> On Sat, Oct 7, 2023 at 12:34 PM Steve Smith  wrote:
>
> This popular-press article came through my Google News feed recently which
> I thought might be useful to the Journalists/English-Majors on the list to
> help understand how LLMs work, etc.   When I read it in detail (forwarded
> from my TS (TinyScreenPhone) on my LS (Large Screen Laptop)) I found it a
> bit more detailed and technical than I'd expected, but nevertheless
> rewarding and possibly offering some traction to Journalism/English majors
> as well as those with a larger investment in the CS/Math implied.
>
> Decomposing Language Models into Understandable Components
>
> 
>
> and the (more) technical paper behind the article
>
> https://transformer-circuits.pub/2023/monosemantic-features/index.html
>
> Despite having sent a few dogs into vaguely similar scuffles in my
> careen(r):
>
> Faceted Ontologies for Pre Incident Indicator Analysis
> 
> SpindleViz 
> ...
>
> ... I admit to finding this both intriguing and well over my head on
> casual inspection...  the (metaphorical?) keywords that drew me in  most
> strongly included *Superposition* and *Thought Vectors*, though they are
> (nod to Glen) probably riddled (heaped, overflowing, bursting, bloated ...
> )  with excess meaning.
>
> https://gabgoh.github.io/ThoughtVectors/
>
> This leads me (surprise!) to an open ended discursive series of thoughts
> probably better left for a separate posting (probably rendered in a
> semasiographic language like Heptapod B
> ).
>
> 
>
> - Steve
>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/


Re: [FRIAM] Language Model Understanding

2023-10-07 Thread Marcus Daniels
The “large” refers to the number of parameters used.  A smaller large language 
model – a deep neural net -- start about 3 billion parameters, but larger ones 
like Claude 2 (the latest large language model of the company that wrote the 
paper Steve mentioned) have more than 130 billion parameters.   Amazingly, it 
is possible using (rooms of) GPUs and other accelerators to optimize in this a 
space of this size.   The billions of parameters come from the vocabulary size 
– the number of tokens that need to be discriminated, the many layers of 
transformers that are needed to capture the complexity of human and non-human 
languages (like DNA), and the context window size – how many paragraphs or 
pages the model is trained on at a time.   A small language model might be 
suitable for understanding the geometries of chemicals, say.

From: Friam  On Behalf Of Tom Johnson
Sent: Saturday, October 7, 2023 2:38 PM
To: The Friday Morning Applied Complexity Coffee Group 
Subject: Re: [FRIAM] Language Model Understanding

Thanks for passing this along, Steve. I wish, however, the authors of this 
short piece would have included a definition of, in their usage, "Large 
Language Models" and "Small Language Models."  Perhaps I can find those in the 
larger paper.
Tom

On Sat, Oct 7, 2023 at 12:34 PM Steve Smith 
mailto:sasm...@swcp.com>> wrote:

This popular-press article came through my Google News feed recently which I 
thought might be useful to the Journalists/English-Majors on the list to help 
understand how LLMs work, etc.   When I read it in detail (forwarded from my TS 
(TinyScreenPhone) on my LS (Large Screen Laptop)) I found it a bit more 
detailed and technical than I'd expected, but nevertheless rewarding and 
possibly offering some traction to Journalism/English majors as well as those 
with a larger investment in the CS/Math implied.

Decomposing Language Models into Understandable Components

[https://efficient-manatee.transforms.svdcdn.com/production/images/Untitled-Artwork-11.png?w=2880=1620=compress%2Cformat=crop=1696477668=d32264d5f5e32c79026b8e310e415c74]

and the (more) technical paper behind the article

https://transformer-circuits.pub/2023/monosemantic-features/index.html
Despite having sent a few dogs into vaguely similar scuffles in my careen(r):
Faceted Ontologies for Pre Incident Indicator Analysis 

SpindleViz
...

... I admit to finding this both intriguing and well over my head on casual 
inspection...  the (metaphorical?) keywords that drew me in  most strongly 
included Superposition and Thought Vectors, though they are (nod to Glen) 
probably riddled (heaped, overflowing, bursting, bloated ... )  with excess 
meaning.

https://gabgoh.github.io/ThoughtVectors/

This leads me (surprise!) to an open ended discursive series of thoughts 
probably better left for a separate posting (probably rendered in a 
semasiographic language like Heptapod 
B).



- Steve
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/


Re: [FRIAM] Language Model Understanding

2023-10-07 Thread Tom Johnson
Thanks for passing this along, Steve. I wish, however, the authors of this
short piece would have included a definition of, in their usage, "Large
Language Models" and "Small Language Models."  Perhaps I can find those in
the larger paper.
Tom

On Sat, Oct 7, 2023 at 12:34 PM Steve Smith  wrote:

> This popular-press article came through my Google News feed recently which
> I thought might be useful to the Journalists/English-Majors on the list to
> help understand how LLMs work, etc.   When I read it in detail (forwarded
> from my TS (TinyScreenPhone) on my LS (Large Screen Laptop)) I found it a
> bit more detailed and technical than I'd expected, but nevertheless
> rewarding and possibly offering some traction to Journalism/English majors
> as well as those with a larger investment in the CS/Math implied.
>
> Decomposing Language Models into Understandable Components
>
> 
>
> and the (more) technical paper behind the article
>
> https://transformer-circuits.pub/2023/monosemantic-features/index.html
>
> Despite having sent a few dogs into vaguely similar scuffles in my
> careen(r):
>
> Faceted Ontologies for Pre Incident Indicator Analysis
> 
> SpindleViz 
> ...
>
> ... I admit to finding this both intriguing and well over my head on
> casual inspection...  the (metaphorical?) keywords that drew me in  most
> strongly included *Superposition* and *Thought Vectors*, though they are
> (nod to Glen) probably riddled (heaped, overflowing, bursting, bloated ...
> )  with excess meaning.
>
> https://gabgoh.github.io/ThoughtVectors/
>
> This leads me (surprise!) to an open ended discursive series of thoughts
> probably better left for a separate posting (probably rendered in a
> semasiographic language like Heptapod B
> ).
>
> 
>
> - Steve
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/


[FRIAM] Language Model Understanding

2023-10-07 Thread Steve Smith
This popular-press article came through my Google News feed recently 
which I thought might be useful to the Journalists/English-Majors on the 
list to help understand how LLMs work, etc.   When I read it in detail 
(forwarded from my TS (TinyScreenPhone) on my LS (Large Screen Laptop)) 
I found it a bit more detailed and technical than I'd expected, but 
nevertheless rewarding and possibly offering some traction to 
Journalism/English majors as well as those with a larger investment in 
the CS/Math implied.


   Decomposing Language Models into Understandable Components
   


and the (more) technical paper behind the article

   https://transformer-circuits.pub/2023/monosemantic-features/index.html
   

Despite having sent a few dogs into vaguely similar scuffles in my 
careen(r):


   Faceted Ontologies for Pre Incident Indicator Analysis
   
   SpindleViz 
   ...

... I admit to finding this both intriguing and well over my head on 
casual inspection...  the (metaphorical?) keywords that drew me in  most 
strongly included /Superposition/ and /Thought Vectors/, though they are 
(nod to Glen) probably riddled (heaped, overflowing, bursting, bloated 
... )  with excess meaning.


https://gabgoh.github.io/ThoughtVectors/

This leads me (surprise!) to an open ended discursive series of thoughts 
probably better left for a separate posting (probably rendered in a 
semasiographic language like Heptapod B 
).




- Steve
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/