Julia Lawall <julia <at> diku.dk> writes:
> On Fri, 17 Dec 2010, Michael T wrote:
> > Thanks!  I will test this next week if that is OK.  I think that the rules 
> > on
> > the use of [<>] should be enough to keep those cases separate.
[...]
> OK, if you feel inspired, you can probable play around with the regular 
> expression in lexer.mll.  In the patch you can see the two lines that are 
> relevant; you shouldn't have to change anything else.

I did try playing around with the regex (please find a slightly modified version
of your patch below) and got a result that did a lot better for me without
changing the output of "make test" as compared to an unpatched build, timestamps
excepted of course.  (You might want to double check that of course if you do
apply the patch.)

It also predictably brought out a number of other C++ parsing problems.  I will
summarise them briefly and can also bake up some C++ source that illustrates
their usage if you like.  I can also put them on a wiki as you suggested if you
can point to a place.  So:

1) In C++, destructors (and constructors for that matter) don't take a return
type (they basically return void, but some clever person decided that it should
be left empty).  Example of a destructor definition for class MyClass:

MyClass::~MyClass() { ... }

2) When objects are constructed they can take a parameter list.  Example of a
local (object) variable of type MyClass:

MyClass myinstance(param1, param2);

3) In C++, objects are created and destroyed on the heap using the operators new
and delete instead of the functions malloc() and free().  The last example using
dynamic creation and destruction:

MyClass *pmyinstance = new MyClass(param1, param2);
delete pmyinstance;

4) The nastiest so far.  Constructor definitions can take initialisation lists
which give the default values of the class members.  If the members are
themselves classes then the initialisation list is a parameter list as in 2). 
Example of a contructor for MyClass:

MyClass::MyClass(type1 param1, type2 param2) : member1(value1), member2(value2),
    member3(param3-1, param3-2)
{ ... }

5) C++ has references as well as pointers.  Think of them as almost the same
thing but written differently ('&' instead of '*').  Example in a function
declaration:

int &myfunc(int &param);

Hoping that I haven't made you regret looking at this in the first place (as I
mentioned before, I think this is a case of "the more the merrier", not "all or
nothing"), here is the patch.

Regards,

Michael

--- a/parsing_c/lexer_c.mll
+++ b/parsing_c/lexer_c.mll
@@ -229,6 +229,7 @@ let ulong = (UnSigned,CLong)
 
 (*****************************************************************************)
 let letter = ['A'-'Z' 'a'-'z' '_']
+let extended_letter = ['A'-'Z' 'a'-'z' '_' ':' '<' '>' '~'] (* for c++ *)
 let digit  = ['0'-'9']
 
 (* not used for the moment *)
@@ -643,7 +644,7 @@ rule token = parse
    * truncate to 31 when compare and truncate to 6 and even lowerise
    * in the external linkage phase
    *)
-  | letter (letter | digit) *
+  | ( letter (extended_letter | digit) * ':' '~' ?) ? letter (letter | digit) *
       { let info = tokinfo lexbuf in
         let s = tok lexbuf in
         Common.profile_code "C parsing.lex_ident" (fun () ->


_______________________________________________
Cocci mailing list
[email protected]
http://lists.diku.dk/mailman/listinfo/cocci
(Web access from inside DIKUs LAN only)

Reply via email to