Hi All,
I am trying to understand how match.pd works as I'm writing a simple matching
rule but have run into some issues
and there's very little documented on match.pd.
short version:
1) Why is there a difference in expressiveness between the GIMPLE and the
GENERIC
versions of match.pd. Particularly why does GENERIC has the limitation that
it must
be an internal function while GIMPLE is happy to just do the substitution.
2) Why doesn't the GIMPLE pass match on the GIMPLE code produced by the Fortran
version?
I have created an example where the GIMPLE trees of the two match exactly %
some attributes
on the expressions.
3) It seems to be either or, you can't have both rules? e.g. I either match in
GENERIC or GIMPLE
but not both? Is this correct? Right now if some other optimization makes it
produce x*cos(x) then
this will never be matched again? Since the GIMPLE passes don't seem to like
the internal builtin?
4) There seems to be a bug with matching in GENERIC pass and when type
conversions are being done. I
am not sure who is wrong though. In the TREE the type has been correctly
casted but it results in a segfault.
Thanks,
Tamar
----
For the long version and to see my current understanding of match.pd and what I
have done:
Let’s say I want to transform occurrences of `x*cos(x)` with a super-dooper
optimized builtin `foo`.
So `x*cos(x) => foo(x)`.
I start off at the obvious place, `match.pd`.
Since I need a new builtin I have to first define it in `builtins.def`. Since
I’m replacing a math function I want optimized versions for `float`, `double`,
etc.
DEF_GCC_BUILTIN (BUILT_IN_FOO, "foo", BT_FN_DOUBLE_DOUBLE_DOUBLE,
ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN (BUILT_IN_FOOF, "foof", BT_FN_FLOAT_FLOAT_FLOAT,
ATTR_CONST_NOTHROW_LEAF_LIST)
DEF_GCC_BUILTIN (BUILT_IN_FOOL, "fool",
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
#define FOO_TYPE(F) BT_FN_##F##_##F##_##F
DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_FOO, "foo", FOO_TYPE,
ATTR_CONST_NOTHROW_LEAF_LIST)
#undef FOO_TYPE
So now I have the builtins defined (I need to also define the proper handling
code in builtins.c but that’s unrelated to the question) and can define the
match rule:
/* x*cos(x) -> foo(x). */
(for coss (COS)
(simplify
(mult:c (coss:s @0) @1)
...
)
)
And the question is what goes into the … block.
I have builtins defined for each of the functions, so one implementation could
be:
/* x*cos(x) -> foo(x). */
(for coss (COS)
(simplify
(mult:c (coss:s @0) @1)
(switch
(if (types_match (type, float_type_node))
(BUILT_IN_FOOF @1 @0))
(if (types_match (type, double_type_node))
(BUILT_IN_FOO @1 @0))
(if (types_match (type, long_double_type_node))
(BUILT_IN_FOOL @1 @0))
)
)
)
But one of the artifacts of defining the builtin in `builtins.def` is that an
iterator is created for you for use in `match.pd` by `gencfn-macros.c`, in this
case I have something like:
(define_operator_list FOO
BUILT_IN_FOOF
BUILT_IN_FOO
BUILT_IN_FOOL
null)
The last `null` is important but I'll get to that later. These are generated
and placed in `cfn-operators.pd` which is
included by `match.pd`.
The other artifact is a macro, for use in case statements in e.g. builtin.c,
e.g. `CASE_CFN_FOO`
Which expands to:
case CFN_BUILT_IN_FOOF:
case CFN_BUILT_IN_FOOT:
case CFN_BUILT_IN_FOOL:
case CFN_FOO:
These are handy, because it means we can easily write match statements to the
correct type. The iterator items are generated in increasing order of width of
the type. E.g. float, double, long double.
So by iterating over both `COS` and the `FOO` iterator at the same time, we can
simplify the the match rule:
/* x*cos(x) -> foo(x). */
(for coss (COS)
foos (FOO)
(simplify
(mult:c (coss:s @0) @1)
(foos @0)
)
)
This rule gets compiled into two files `generic-match.c` and `gimple-match.c`.
The first working on GENERIC tree and the second ofcourse on GIMPLE trees.
Given that GIMPLE and GENERIC have slightly different semantics the code for
these
two are fairly different. The placement of the rule in `match.pd` seems to be
important,
the rules are applied in order of their specification in `match.pd`.
The GIMPLE matcher will generate statements such as
switch (gimple_call_combined_fn (def))
{
case CFN_BUILT_IN_COSF:
{
tree o20 = gimple_call_arg (def, 0);
if ((o20 = do_valueize (valueize, o20)))
{
{
...
*res_code = CFN_BUILT_IN_FOOF;
res_ops[0] = captures[1];
gimple_resimplify1 (lseq, res_code, type, res_ops,
valueize);
return true;
}
}
break;
}
case CFN_BUILT_IN_COSS:
{
tree o20 = gimple_call_arg (def, 0);
if ((o20 = do_valueize (valueize, o20)))
{
{
...
*res_code = CFN_BUILT_IN_FOO;
res_ops[0] = captures[1];
gimple_resimplify1 (lseq, res_code, type, res_ops,
valueize);
return true;
}
}
break;
}
...
default:;
}
}
Because of the order of the iterations, the correct function is matched. e.g.
COSF -> FOOF,
COS -> FOO, etc.
The GENERIC version has the same global structure but differs vastly in how
replacements are done:
case CALL_EXPR:
switch (get_call_combined_fn (op0))
{
case CFN_BUILT_IN_COSF:
{
tree o20 = CALL_EXPR_ARG (op0, 0);
{
...
res = maybe_build_call_expr_loc (loc, CFN_BUILT_IN_FOOF, type, 1,
res_op0);
if (!res)
return NULL_TREE;
if (TREE_SIDE_EFFECTS (captures[2]))
res = build2_loc (loc, COMPOUND_EXPR, type, fold_ignored_result
(captures[2]), res);
return res;
}
break;
}
case CFN_BUILT_IN_COSS:
...
The important function here is `maybe_build_call_expr_loc`, again, I'll get to
that later.
Testing it out, given a C input file:
#include <math.h>
float do_cool_stuff(float x)
{
return x*cos(x);
}
we get:
do_cool_stuff (float x)
{
float D.4154;
_1 = (double) x;
_2 = __builtin_foo (_1);
D.4154 = (float) _2;
return D.4154;
}
The rule seems to be matching early in the GIMPLE phase and not the generic,
fair enough as long as it works.
It also correctly adds type casting and calling the double version of `foo`.
Fortran however disagrees:
For my Fortran function
function func()
real :: j ! output
j = j * cos(j)
end function func
I get
func ()
{
real(kind=4) j;
_1 = __builtin_cosf (j);
j = j * _1;
}
So fortran is not applying the rule on neither GENERIC nor GIMPLE.
I am not sure why the GIMPLE one does not apply, but the GENERIC one doesn't
apply
for neither Fortran nor C because of an implicit constrains on
`generic-match.c` which comes
from `maybe_build_call_expr_loc`.
This function will only allow the substitution if the function to be
substituted to is an internal builtin.
Presumably because it wants to prevent introducing a function that it cannot
fold or lower later. As this might
introduce a link error later if it can't remove it.
Fair enough, the way to tell it that it can do get rid of the function is by
adding it to `internal-fn.def`.
A comment in the file states that:
DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition,
the function implements the computational part of a built-in math
function BUILT_IN_<NAME>{F,,L}.
Which is exactly what I need.
So to `internal-fn.def` I add a new entry
DEF_INTERNAL_FLT_FN (FOO, ECF_CONST, foo, unary)
Which now in order for it to work requires the optab `foo` to be defined. So in
`optabs.def`
I add:
OPTAB_D (foo_optab, "foo$F$a2")
As the comment suggests I expected a mapping to be done between the typed
functions.
e.g. I expected there to be 3 new internal functions, one for float, double and
long double.
However only one new builtin is generated `IFN_FOO`. And this is added in the
iterator for FOO
in place of the `(null)` we saw earlier:
(define_operator_list FOO
BUILT_IN_FOOF
BUILT_IN_FOO
BUILT_IN_FOOL
IFN_FOO)
Re-running the fortran example it still does not match. This is because the
`cos` is
expanded into `cosf` and then matches against `BUILT_IN_FOOF` and not the
internal function.
So generic-match.c still refuses to match it.
The only way I have found to make it match is to change the matching rule to:
/* x*cos(x) -> foo(x). */
(for coss (COS)
(simplify
(mult:c (coss:s @0) @1)
(IFN_FOO @0)
)
)
e.g. Map all the rules to the type generic IFN_FOO.
Running the example now still produces nothing, this is because if I understood
correctly the rewrite
is only done *if* an optab for the required type actually exists. So I create
one in the backend.
I create an obtab for both double and float types (doing so in the aarch64
backend):
(define_insn "foo<mode>2"
[(set (match_operand:GPF_F16 0 "register_operand" "=w")
(abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
"TARGET_FLOAT"
"fabs\\t%<s>0, %<s>1"
[(set_attr "type" "ffarith<stype>")]
)
The operation is non-sense but that doesn't matter, we just care about the
gimple:
func ()
{
real(kind=4) j;
j = FOO (j);
}
So the fortan code has correctly replaced `x*cos(x)` with `FOO`.
C however now crashes with a segfault:
cos.c: In function ‘do_cool_stuff’:
cos.c:5:4: internal compiler error: Segmentation fault
return x*cos(x);
^~~~~~
0xb55ddf crash_signal
/work/tamchr01/gnu-work/src/gcc/gcc/toplev.c:333
0x6f227c builtin_mathfn_code(tree_node const*)
/work/tamchr01/gnu-work/src/gcc/gcc/builtins.c:7532
0x76070e convert_to_real_1
This happens
if (TREE_CODE (t) != CALL_EXPR
|| TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR)
here because `CALL_EXPR_FN (t)` returns NULL. This is because one of the
operators
in the operators list is null. TREE_CODE just blindly dereferences it. This
seems to
be a bug, however I am not sure where. If the bug is in `builtin_mathfn_code`
or in the
code that generated the tree in the first place. Note that this does not happen
if
type casts are not being done.
If I change
float do_cool_stuff(float x)
{
return x*cos(x);
}
to
float do_cool_stuff(float x)
{
return x*cosf(x);
}
The type error can be fixed by adding an extra check:
if (TREE_CODE (t) != CALL_EXPR
|| (CALL_EXPR_FN (t) && TREE_CODE (CALL_EXPR_FN (t)) != ADDR_EXPR))
to `builtin_mathfn_code` and this seems to work fine and I think is correct, as
this function is allowed to say NO.