Re: [Cocci] help w spatch on perl codebase

Julia Lawall Fri, 11 Mar 2011 14:56:04 -0800

On Fri, 11 Mar 2011, Jim Cromie wrote:

> Im trying to use spatch in the Perl codebase
> git://perl5.git.perl.org/perl.git
> to convert a struct->field ref to a macro, and am encountering difficulties.
> 
> I think theyre due to Perl's heavy use of macros, but Ive had a few
> surprises,
> which leaves me uncertain of my understanding.
> 
> My cocci file is pretty simple, but misses stuff.
> It finds 1 change in toke.c, but misses everything in op.c,
> which is where the bulk of the field-refs happen.
> 
> jimc@chumly:~/projects/perl-core/perl$ spatch -sp_file ../op_sibl.cocci
> -verbose_parsing -debug -show_cocci -all_includes toke.c
> init_defs_builtins: /usr/share/coccinelle/standard.h
> -----------------------------------------------------------------------
> processing semantic patch file: ../op_sibl.cocci
> with isos from: /usr/share/coccinelle/standard.iso
> -----------------------------------------------------------------------
> 
> 
> @ sibling_LHS@
> expression OP;
> expression OP2;
> @@
> 
> - OP->op_sibling = OP2
> + setOpSibling(OP, OP2)
> 
> @sibling_RHS@
> expression OP;
> @@
> 
> - OP->op_sibling
> + OpSibling(OP)
> 
> @@
> type TOP;
> @@
> 
> - TOP->op_sibling
> + OpSibling(TOP)
> 
> 
> 189 191
> Fatal error: exception Failure("minus: parse error:
>  = File "../op_sibl.cocci", line 22, column 5,  charpos = 189
>      around = '->', whole content = - TOP->op_sibling
> ")
> 
> 
> I added the TOP rule in attempt to catch simple vars, but given the error I
> pulled it out.


A type metavariable represents a type like int.  Perhaps you wanted to put 
identifier?  But the case of an identifier would already have been 
transformed by the case for expression.

> Once I yanked it, it found 1 transformation:
> 
> HANDLING: toke.c
> diff =
> --- toke.c 2011-03-02 09:09:57.392038433 -0500
> +++ /tmp/cocci-output-31503-f60bf3-toke.c 2011-03-02 13:02:00.717080475
> -0500
> @@ -2370,7 +2370,7 @@ S_sublex_start(pTHX)
>      }
>      else if (op_type == OP_BACKTICK && PL_lex_op) {
>   /* readpipe() vas overriden */
> - cSVOPx(cLISTOPx(cUNOPx(PL_lex_op)->op_first)->op_first->op_sibling)->op_sv
> = tokeq(PL_lex_stuff);
> + cSVOPx(OpSibling(cLISTOPx(cUNOPx(PL_lex_op)->op_first)->op_first))->op_sv
> = tokeq(PL_lex_stuff);
>   pl_yylval.opval = PL_lex_op;
>   PL_lex_op = NULL;
>   PL_lex_stuff = NULL;
> Note: processing took   113.1s: toke.c

That is a lot of time for such a simple transformation.  I think it must 
be having a very hard time parsing your code.  It tries several strategies 
for each function, as long as it is unable to parse the function.

Did you try running spatch -parse_c file.c?  Then it will give you all 
the parse errors, which are likely to be too numerous and verbose to be 
really useful.  But at the end of the output it tells you the 10 tokens it 
had the most trouble with.  Usually you can define macro definitions for 
those, and very much improve the rate of parsed code.  You should have a 
file called standard.h installed on your system (at the root of the source 
tree, if you have the sources).  There you can see some examples of how to 
define macros.  Then you can run spatch with the argument -macro_file the 
name of a file containing your macros.  Or if you think you always want 
those definitions, you could replace standard.h, but I guess intsalling a 
new version will cause your definitions to be overwritten.

> the patch clearly finds and changes a single expression, but :
> 
> - it doesnt see it as the LHS of an assignment, why not ?

I'm not sure what you mean here.  Coccinelle considers both the left and 
right sides of = to be expressions.  So if you write a pattern that has 
the form of an expression it will potentially match either of them.  If 
you want a special treatment for the left hand side of an assignment, you 
should put a rule for that before the one for other kinds of expressions.

> I gather order is not a good way to construct cocci rules, but Im unclear
> on
> how to spec that dependence.

Order is the proper way to do things. Coccinelle applies the first rule to 
each top-level definition in the file, then the second rule to each 
resulting top-level definition in the file, etc.  You can also give the 
rules names and use depends on between the initial @@ to express 
dependencies on the success or failure of other rules.  For example:

@r exists@
identifier f;
expression E;
@@

f (...) { <+... return E; ...+> }

@null exists@
identifier r.f;
@@

f (...) { <+... return NULL; ...+> }

@depends on !null@
identifier r.f;
@@

- f(...)
+ g()

This replaces all calls to functions that never explicitly return NULL by 
g(),

>  OP's type here is a typedef, ie one of:
> perl.h:typedef struct op OP;
> perl.h:typedef struct cop COP;
> perl.h:typedef struct unop UNOP;
> perl.h:typedef struct binop BINOP;
> perl.h:typedef struct listop LISTOP;
> perl.h:typedef struct logop LOGOP;
> where .+OPs have OP fields in common, but are not *inherited* (ie strictly
> C)
> All these perl ideosyncracies leave me puzzled on how to apply SmPL to them.

I'm not sure this information is needed.  Perhaps the field name is good 
enough?  Do you think there are other structure that have a field with the 
name op_sibling?  If you want to be more careful, you can specify a set of 
possible types for an expression:

{struct op, struct cop, struct unop, ...} E;

(... is not a SmPL "...")
It should see through the typedefs if it has access to the .h file that 
contains their definition.  You may need to give the option -all_includes 
and -I to specify the directory containing the header files.

> - would a proper binding rule also help to reduce processing time ?

I'm not sure what you mean by a proper binding rule.  But I suspect that 
the processing time is entirely due to parsing.  There is also an option 
-profile that lets you see where it is taking the most time.

> - it does see and transform a complex expression.
> The latter is the reason I tried the TOP rule above, thinking that perhaps
> expression was only
> expression (not a simple term)  This doesnt seem to be the case, and wasnt
> what I expected apriori - Im just saying it all here in case Ive missed some
> other aspect.
> 
> 
> So, on to the macro-ish problems...
> parsing toke.c is much more successful than op.c, so I start there.

OK, I see you have already tried parse_c :)

> maybe 10 most problematic tokens
> -----------------------------------------------------------------------
> pTHX_: present in 80 parsing errors
> example:
> 
>        void
>        Perl_munge_qwlist_to_paren_list(pTHX_ OP *qwlist)
>        {

Add #define pTHX_ to your .h file to be given with the argument 
-macro_file.

> s: present in 51 parsing errors
> example:
> 
>        char *
>        Perl_scan_vstring(pTHX_ const char *s, const char *const e, SV *sv)
>        {
> 
> 
> I suspect that both these problems, and many others, are due to the defn of
> pTHX_

Yes.

> which is a set of layered macros that introduce a maybe-term,
> either blank-whitespace, or a "type var," as 1st item in a function
> parameter-list,
> (Read the trailing _ as a comma when the expansion isnt empty)
> Ive pasted in some of the macros for perusal, skip if obvious/uninteresting.

Coccinelle doesn't care (much) about the actual definition of the macro.

> #ifdef PERL_IMPLICIT_CONTEXT
> #  define tTHX  PerlInterpreter*
> #  define pTHX  register tTHX my_perl PERL_UNUSED_DECL
> #  define aTHX  my_perl
> #  ifdef PERL_GLOBAL_STRUCT
> #    define dTHXa(a)    dVAR; pTHX = (tTHX)a
> #  else
> #    define dTHXa(a)    pTHX = (tTHX)a
> #  endif
> #  ifdef PERL_GLOBAL_STRUCT
> #    define dTHX                dVAR; pTHX = PERL_GET_THX
> #  else
> #    define dTHX                pTHX = PERL_GET_THX
> #  endif
> #  define pTHX_         pTHX,
> #  define aTHX_         aTHX,
> 
> #ifndef pTHX
> /* Don't bother defining tTHX and sTHX; using them outside
>  * code guarded by PERL_IMPLICIT_CONTEXT is an error.
>  */
> #  define pTHX          void
> #  define pTHX_
> 
> building perl for threads indirectly turns on PERL_IMPLICIT_CONTEXT,
> which plugs the interpreter pointer into the 1st argument of *many* api
> functions.
> 
> 
> 
> Parsing op.c shows many BAD lines:
> 
> jimc@chumly:~/projects/perl-core/build-threads$ grep BAD ../spatch-res-op
> |wc
>     189    1088    9487
> 
> jimc@chumly:~/projects/perl-core/build-threads$ grep BAD ../spatch-res-op
> BAD:!!!!! #define CALL_OPFREEHOOK(o) if (PL_opfreehook) PL_opfreehook(aTHX_
> o)
> BAD:!!!!! Perl_Slab_Alloc(pTHX_ size_t sz)
> BAD:!!!!!     I32 **const slabs = PL_slabs;
> BAD:!!!!! S_Slab_to_rw(pTHX_ void *op)
> BAD:!!!!! Perl_op_refcnt_inc(pTHX_ OP *o)
> BAD:!!!!! Perl_op_refcnt_dec(pTHX_ OP *o)
> BAD:!!!!! Perl_Slab_Free(pTHX_ void *op)
> BAD:!!!!!      : PL_check[type](aTHX_ (OP*)o))
> BAD:!!!!!     STMT_START { \
> 
> many of these appear to be the pTHX, as mentioned above,
> CALL_OPFREEHOOK may also be problematic:
> which expands to:
>     if ((my_perl->Iopfreehook)) (my_perl->Iopfreehook)(my_perl, o);
> 
> What exactly does BAD !!! mean anyway ?

This is the line where it gave up on the current function.

> Is it more or less as described in /usr/share/coccinelle/standard.h ?
> or is it that the dynamic expansion is insufficient ?
> 
> 
> Overall, whats the right strategy for a project to help coccinelle to
> support it ?
> - a project specific  -macro_file           <file>
> Do you keep links to such project files that would help in understanding
> whats needed ?

I don't have any, but it would indeed be a good idea to collect them.

> lastly, though these are rather unimportant ...
> 
> jimc@chumly:~/projects/perl-core/perl$ spatch -sp_file ../op_sibl.cocci
> -verbose_parsing -debug -show_cocci -all_includes -parse_ch op.c 2>
> ../spatch-res-op
> NB total files = 1; perfect = 0; pbs = 1; timeout = 0; =========> 0%
> nb good = 296,  nb passed = 70 =========> 23.648649% passed
> nb good = 296,  nb bad = 9862 =========> 2.913959% good
> jimc@chumly:~/projects/perl-core/perl$ wc ../spatch-res-op
>  10790  46067 333773 ../spatch-res-op
> 
> what is NB and nb ?   Noto Bono or Number of Blocks ?

"number"

> what does passed mean ?

Sometimes when there is a macro that takes arguments, it considers that it 
is not able to properly parse the arguments, and the arguments are passed.  
No transformation occurs on them.  This happens for macros that are 
defined in standard.h or your macro file.  It has to do with the fact that 
there are two views of the arguments, as they are and as the macro file 
defines them to be.

> why does nb good show up 2x ?

I think it is just to make each line self-contained, but I have never 
thought about it (I didn't write the C parser).

> %good = nb-good / (nb-bad + nb-good)
> 

julia
_______________________________________________
Cocci mailing list
[email protected]
http://lists.diku.dk/mailman/listinfo/cocci
(Web access from inside DIKUs LAN only)

Re: [Cocci] help w spatch on perl codebase

Reply via email to