Re: [c-prog] character - token

Armand Postma Fri, 14 Sep 2007 13:26:42 -0700

That depends how you tokenize it (what you use to split it)... I guess 
you could
could tokenize a line like that by just reading every character and 
skipping white-spaces,
which would give you the following tokens:
(
2
+
4
)
+
(
2
+
6
)


The "problem" with something like that is that there is no real 
splitting character (like a whitespace, or a ;),
this is not nescessarily a problem however, you could just read in 
single characters, however one could argue
if that would be tokenizing or just character reading... That method 
also presents a problem however, consider a sum like:
(2+4)+(12+26), if we would tokenize this by just reading characters we 
would end up with 2 seperate tokens in the when a
number is larger than 10, in the case of 26 for example we would get the 
tokens "2" and "6", which wouldn't be what we want...
The solution here would be to split whenever a character is found that 
is not a number, and otherwise just continue on.

a simple implementation of something like this would be for example:
void Tokenize()
{
  std::string MyString = "(2+4)+(12+26)";
  std::string TokenBuffer = "";

  for ( int d=0; d!=MyString.length(); d++ )
  { 
    if ( ( MyString[d] > 47 ) && ( MyString[d] < 58 ) )
    {
      // If the character is a number then add it to our Token
      TokenBuffer += MyString[d];
    }
    else
    {
      // If not then, then display our token (if there is something to 
display)
      if ( TokenBuffer.length() > 0 )
        std::cout << "Token: " << TokenBuffer << std::endl;
         
      // Display the non-number token if it's not a white-space
      // We assume that everything except a white space is a token...
      if ( MyString[d] != 12 )
        std::cout << "Token: " << MyString[d] << std::endl;
         
      // In case of a calculator you would of course add the token
      // to a list probably after checking the token if it's valid first...
        
      // Clear the token buffer, so its ready to receive the next token
      TokenBuffer = "";
    }
  }

  // There might still be a token we haven't handled yet, if there is 
than handle it!
  if ( TokenBuffer.length() > 0 )
    std::cout << "Token: " << TokenBuffer << std::endl;
}

This will display:
(
2
+
4
)
+
(
12
+
26
)

Like I put the comments of the code, you would probably want to check if 
the found token is something you expect
and then add it to a list... :)

I Hope this made it more clear :)

Sincerely,

Armand Postma



Robert Ryan schreef:
>
> so, tokens (2+4)+(2+6) is 11 tokens, excluding white space (2 + 4) + ( 
> 2 + 6) is still 11 tokens.........still trying to understand. I should 
> have done this first.
> a token can be each element (word) of a string when talking about 
> characters as you said below
>
> Armand Postma <[EMAIL PROTECTED] 
> <mailto:lion.tiger83%40gmail.com>> wrote: Hello,
>
> A token can be a character, but a character is not a token...
>
> Basically a token is a section of a string, for example:
> "I am walking home from school"
> If we split (tokenize) this line on space-characters we would get the
> following 6 tokens:
> 1. I
> 2. am
> 3. walking
> 4. home
> 5. from
> 6. school
>
> As you can see "I" is just 1 character, so a character can be a token, but
> one could consider that more a coincidence. Again it depence on your
> specific needs,
> if you only need characters, you could for example have something like:
> A;B;C;D;E;F;G;
> If we would tokenize that line using the ; as a splitting character we
> would only end up with characters...
>
> The tokenizers (and therefor tokens) are quite commonly used in several
> applications, your C++ compiler for example
> uses a tokenizer to detect and interpret the commands you put in your
> source code, a network application might use it
> to tokenizer packages (considering the packages are text rather than
> binary). By using a tokenizer it becomes possible that
> text commands (tokens) can have variable lengths without the need of
> mentioning the length anywhere (besides marking the
> end with a splitting character).
>
> A side note about the whole character thing, yes a token can be a
> character, but usually a tokenizer (string splitting functions)
> spits out strings, so even if the character "I" would be returned by
> that routine and the fact that languages like English etc consider "I"
> to be a
> single character, a programming language will in most cases still
> consider "I" a string regardless of it's length. (In C++ that will
> be the case if you use the std::string for example).
>
> Hope this helps! :)
>
> Yours Sincerely,
>
> Armand Postma
>
> Robert Ryan schreef:
> >
> >
> > what is the difference between characters and tokens
> > can a character be a token, but a token not a character
> > thanks
> >
> > ---------------------------------
> > Pinpoint customers who are looking for what you sell.
> >
> > [Non-text portions of this message have been removed]
> >
> >
>
> [Non-text portions of this message have been removed]
>
>
>
>
>
> ---------------------------------
> Shape Yahoo! in your own image. Join our Network Research Panel today!
>
> [Non-text portions of this message have been removed]
>
>  



[Non-text portions of this message have been removed]

Re: [c-prog] character - token

Reply via email to