My point was that token boundaries are fuzzy. This causes problems because
LLMs predict tokens, not characters or bits. There was a thread on Reddit
about ChatGPT not being able to count the number of R's in "strawberry".
The problem is that it sees the word but not the letters.
https://www.reddit.com/r/ChatGPT/s/xYBVddV6jw

Text compressors solve this problem by modeling both words and letters and
combining the predictions.

On Fri, Jun 14, 2024, 3:44 PM James Bowery <[email protected]> wrote:

>
>
> On Wed, May 29, 2024 at 11:24 AM Matt Mahoney <[email protected]>
> wrote:
>
>> Natural language is ambiguous at every level including tokens. Is
>> "someone" one word or two?
>>
>
> Tom Etter <https://en.wikipedia.org/wiki/Dartmouth_workshop#Participants>'s
> tragically unfinished final paper "Membership and Identity
> <https://groups.io/g/lawsofform/files/Boundary%20Institute/Tom%20Etter%20Papers/Membership_and_Identity.pdf>"
> has this quite insightful passage:
>
> Thing (n., singular): anything that can be distinguished from something
>> else.
>> ...
>> ...the word "thing" is a broken-off fragment of the more
>> fundamental compound words "anything" and "something". That these words are
>> fundamental is hardly debatable, since they are two of the four fundamental
>> words of symbolic logic, where they are written as ∀ and ∃. With this in
>> mind, let's reexamine the above definition of a *thing* as anything that
>> can be distinguished from something else...
>
>
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M85f7e0507c5c4a130f91f15b>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M66075f51488aa63fe906ccfd
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to