My point was that token boundaries are fuzzy. This causes problems because LLMs predict tokens, not characters or bits. There was a thread on Reddit about ChatGPT not being able to count the number of R's in "strawberry". The problem is that it sees the word but not the letters. https://www.reddit.com/r/ChatGPT/s/xYBVddV6jw
Text compressors solve this problem by modeling both words and letters and combining the predictions. On Fri, Jun 14, 2024, 3:44 PM James Bowery <[email protected]> wrote: > > > On Wed, May 29, 2024 at 11:24 AM Matt Mahoney <[email protected]> > wrote: > >> Natural language is ambiguous at every level including tokens. Is >> "someone" one word or two? >> > > Tom Etter <https://en.wikipedia.org/wiki/Dartmouth_workshop#Participants>'s > tragically unfinished final paper "Membership and Identity > <https://groups.io/g/lawsofform/files/Boundary%20Institute/Tom%20Etter%20Papers/Membership_and_Identity.pdf>" > has this quite insightful passage: > > Thing (n., singular): anything that can be distinguished from something >> else. >> ... >> ...the word "thing" is a broken-off fragment of the more >> fundamental compound words "anything" and "something". That these words are >> fundamental is hardly debatable, since they are two of the four fundamental >> words of symbolic logic, where they are written as ∀ and ∃. With this in >> mind, let's reexamine the above definition of a *thing* as anything that >> can be distinguished from something else... > > > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M85f7e0507c5c4a130f91f15b> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M66075f51488aa63fe906ccfd Delivery options: https://agi.topicbox.com/groups/agi/subscription
