Re[2]: cedet-1.0beta1 /Semantic 2.0 and JDEE-2.3.2

Eric M. Ludlam Thu, 18 Dec 2003 14:39:55 -0800

>>> Kai Grossjohann <[EMAIL PROTECTED]> seems to think that:
>"Eric M. Ludlam" <[EMAIL PROTECTED]> writes:
>
>>   On the make it easy side, Emacs Lisp is just not a great language
>> for making an easy-to-read lexical analyzer.  The macros let you write
>> and mix individual analyzers in a convenient high-level way.
>
>My understanding was that Common Lisp has a configurable reader that
>is flexible enough so that one does not need to use lexers.  Is this
>true?


This I don't know, but it sounds like something CL would do.

>I wonder if it would be a workable approach to augment Emacs to have a
>better reader, then to use that as the lexer.

I have thought of this often.  My consideration was something like
'parse-partial-sexp'.  The Emacs syntax-table really is exactly the
way to build a nice lexical analyzer.

>I don't have practical experience with building parsers.
>Theoretically, one wouldn't need a lexer, just something that returns
>the next character from the input would be sufficient and the rest
>could be done from the grammar.  (I mean that one doesn't need lex and
>could do everything in yacc instead.)  But that makes parsers
>difficult to write, and also probably slow.  So one does need lexers.
>But the theory seems to imply that the lexers don't need to be
>all-powerful: if the lexer is too stupid, then one can still do it
>from the grammar.
  [ ... ]

Lexical analysis is nice because matching characters is really easy to
to do compared to actual syntax parsing.

Regexp matching is nice because you can have fairly complex lexical
tokens such as "#include".  Mixed mode code such as C preprocessor
and C code is made simpler to handle this way.

Semantic's lexers are pretty simple.  Syntax table regexps such as
\s. or \s_ are used so specific generic analyzers can be used
in most lexers.

Anyway, the problem is how to debug a lexer.  I can imagine this
being even more difficult if it were a built-in. ;)

If you were to look at a lex file sometime, it's syntax and content
is usually really simple, along the lines of (from the man page):

if|then|begin|end|procedure|function {  <some action here> }

which basically says if you see the characters "i" "f", it makes a
token if.  Your action then does whatever.

Semantic lex analyzers are the same, and could be:

(define-lex-simple-regexp-analyzer my-analyzer
  "obligatory docstirng"
  "if\\|then\\|begin\\|end\\|procedure\\|function"
  'keyword
  0  ; index into the regexp
  (other code here))

so it is very similar.  What is different is that you have to then
combine your named analyzers via the define-lex call.

Another big difference is that the semantic lexer can skip over lists
like { a method body } as a single token.  The process of identifying
those characters is very fast (an Emacs built in) and when tagging a
file, often those characters are not needed anyway.  This short cut is
what makes the C and Java semantic parsers fast enough to be usable in
while editing without getting in the way too much.

Have fun
Eric

-- 
          Eric Ludlam:                 [EMAIL PROTECTED], [EMAIL PROTECTED]
   Home: http://www.ludlam.net            Siege: www.siege-engine.com
Emacs: http://cedet.sourceforge.net               GNU: www.gnu.org

Re[2]: cedet-1.0beta1 /Semantic 2.0 and JDEE-2.3.2

Reply via email to