On Wed, Apr 4, 2012 at 5:50 PM, Andrew MacLeod <amacl...@redhat.com> wrote:
> On 04/04/2012 10:33 AM, Richard Guenther wrote:
>> On Wed, Apr 4, 2012 at 3:28 PM, Andrew MacLeod<amacl...@redhat.com>
>>  wrote:
>> This is a WIP... that fntype fields is there for simplicity..   and no...
>> you can do a 1 byte atomic operation on a full word object if you want by
>> Oh, so you rather need a size or a mode specified, not a "fntype"?
> yes, poorly named perhaps as I created things... its just a type node at the
> moment that indicates the size being operated on that I collected from the
> builtin in function.

Ok.  Remember that you should use non-tree things if you can in GIMPLE
land.  This probably means that both the size and the memmodel "operands"
should be

+ struct GTY(()) gimple_statement_atomic
+ {
+   /* [ WORD 1-8 ]  */
+   struct gimple_statement_with_memory_ops_base membase;
+   /* [ WORD 9 ] */
+   enum gimple_atomic_kind kind;
     enum gimple_atomic_memmodel memmodel;

     unsigned size;

and not be trees in the ops array.  Even more, both kind and memmodel
should probably go in membase.base.subcode

>> In the example you only ever use address operands (not memory operands)
>> to the GIMPLE_ATOMIC - is that true in all cases?  Is the result always
>> non-memory?
> The atomic address can be any arbitrary memory location... I haven't gotten
> to that yet.  its commonly just an address so I'm working with that first as
> proof of concept. When it gets something else it'll trap and I'll know :-)

Ok.  Please try to avoid memory operands and stick to address operands ;)
You can make tree-ssa-operands.c not consider ADDR_EXPRs in the
address operands, similar to the ADDR_EXPRs in MEM_REF operand zero.

> Results are always non-memory, other than the side effects of the atomic
> contents changing and having to up date the second parameter to the
> compare_exchange routine.  The generic routines for arbitary structures (not
> added in yet), actually just work with blocks of memory, but they are all
> handled by addresses and the functions themselves are typically void.  I was
> planning on folding them right into the existing atomic_kinds as well... I
> can recognize from the type that it wont map to a integral type.  I needed
> separate builtins in 4.7  for them since the parameter list was different.
>> I suppose the GIMPLE_ATOMICs are still optimization barriers for all
>> memory, not just that possibly referenced by them?
> yes, depending on the memory model used.  It can force synchronization with
> other CPUs/threads which will have the appearence of changing any shared
> memory location.  Various guarantees are made about whether those changes
> are visible to this thread after an atomic operation so we can't reuse
> shared values in those cases.  Various guarantees are made about what
> changes this thread has made are visible to other CPUs/threads at an atomic
> call as well, so that precludes moving stores downward in some models.
>>> and during expansion to RTL, can trivially see that cmpxchg.2_4 is not
>>> used,
>>> and generate the really efficient compare and swap pattern which only
>>> produces a boolean result.
>> I suppose gimple stmt folding could transform it as well?
> it could if I provided gimple statements for the 3 different forms of C&S. I
> was planning to just leave it this way since its the interface being forced
> by C++11 as well as C11... and then just emit the appropriate RTL for this
> one C&S type.  The RTL patterns are already defined for the 2 easy cases for
> the __sync routines. the third one was added for __atomic.  Its possible
> that the process of integrating the __sync routines with GIMPLE_ATOMIC will
> indicate its better to add those forms as atomic_kinds and then
> gimple_fold_stmt could take care of it as well.   Maybe that is just a good
> idea anyway...  I'll keep it in mind.
>>>   if only cmpxchg.2_4 were used, we can generate
>>> the C&S pattern which only returns the result.  Only if we see both are
>>> actually used do we have to fall back to the much uglier pattern we have
>>> that produces both results.  Currently we always generate this pattern.
>>> Next, we have the C11 atomic type qualifier which needs to be
>>> implemented.
>>>  Every reference to this variable is going to have to be expanded into
>>> one
>>> or more atomic operations of some sort.  Yes, I probably could do that by
>>> emitting built-in functions, but they are a bit more unwieldy, its far
>>> simpler to just create gimple_statements.
>> As I understand you first generate builtins anyway and then lower them?
>> Or are you planning on emitting those for GENERIC as well?  Remember
>> GENERIC is not GIMPLE, so you'd need new tree codes anyway ;)
>> Or do you plan to make __atomic integral part of GENERIC and thus
>> do this lowering during gimplification?
> I was actually thinking about doing it during gimplification... I hadnt
> gotten as far as figuring out what to do with the functions from the front
> end yet.  I dont know that code well, but I was in fact hoping there was a
> way to 'recognize' the function names easily and avoid built in functions
> completely...

Heh ... you'd still need a GENERIC representation then.  Possibly
a ATOMIC_OP tree may do.

> The C parser is going to have to understand the set of C11 routine names for
> all these anyway.. I figured there was something in there that could be
> done.
>>> I also hope that when done, I can also remove all the ugly built-in
>>> overload
>>> code that was created for __sync and continues to be used by __atomic.
>> But the builtins will stay for our users consumption and libstdc++ use,
>> no?
> well, the names must remain exposed and recognizable since they are 'out
> there'.  Maybe under the covers I can just leave them as normal calls and
> then during gimplification simply recognize the names and generate
> GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be ideal.
>  That way there are no builtins any more.

I suppose my question was whether the frontends need to do sth about the
__atomic keyword or if that is simply translated to some type flag - or
is that keyword applying to operations, not to objects or types?

>>> So bottom line, a GIMPLE_ATOMIC statement is just an object that is much
>>> easier to work with.
>> Yes, I see that it is easier to work with for you.  All other statements
>> will
>> see GIMPLE_ATOMICs as blockers for their work though, even if they
>> already deal with calls just fine - that's why I most of the time suggest
>> to use builtins (or internal fns) to do things (I bet you forgot to update
>> enough predicates ...).  Can GIMPLE_ATOMICs throw with
>> -fnon-call-exceptions?
>> I suppose yes.  One thing you missed at least ;)
> Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed more
> than a few things so far.  Im pretty early in the process :-)

What about compare-exchange on a pointer dereference? The pointer
dereference surely can trap, so it can throw with -fnon-call-exceptions.  No?

>>>  It cleans up both initial creation and rtl generation,
>>> as well as being easier to manipulate. It also encompasses an entire
>>> class
>>> of operations that are becoming more integral *if* we can make them
>>> efficient, and I hope to actually do some optimizations on them
>>> eventually.
>>>  I had a discussion last fall with Linus about what we needed to be able
>>> to
>>> do to them in order for the kernel to use __atomic instead of their
>>> home-rolled solutions.  Could I do everything with builtins? sure... its
>>> just more awkward and this approach seems cleaner to me.
>> Cleaner if you look at it in isolation - messy if you consider that not
>> only
>> things working with atomics need to (not) deal with these new stmt kind.
> They can affect shared memory in some ways like a call, but don't have many
> of the other attributes of call.  They are really more like an assignment or
> other operation with arbitrary shared memory side effects.  I do hope to be
> able to teach the optimizers the directionality of the memory model
> restrictions.  ie, ACQUIRE is only a barrier to hoisting shared memory
> code...  stores can be moved downward past this mode. RELEASE is only a
> barrier to sinking code.   RELAXED is no barrier at all to code motion.  In
> fact, a relaxed store is barely different than a real store... but there is
> a slight difference so we can't make it a normal store :-P.
> By teaching the other parts of the compiler about a GIMPLE_ ATOMIC, we could
> hopefully lessen their impact eventually.

Ok.  I suppose having a GIMPLE_ATOMIC is fine then.

+ /* In tree-atomic.c.  */
+ extern bool expand_gimple_atomic_load (gimple);

err, gimple-atomic.c please ;)

+ /* Return the expression field of atomic operation GS.  */
+ static inline tree
+ gimple_atomic_expr (const_gimple gs)
+ {
+   gcc_assert (gimple_atomic_has_expr (gs));
+   return gimple_op (gs, 2);
+ }

err - what's "expression" in this context?  I hope it's not an arbitrary
tcc_expression tree?!

+ static inline bool
+ gimple_atomic_has_fail_order (const_gimple gs)
+ {
+   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
+ }

btw, these kind of predicates look superfluous to me - if they are true
exactly for one atomic kind then users should use the predicate to
test for that specific atomic kind, not for some random field presence.

+ /* Return the arithmetic operation tree code for atomic operation GS.  */
+ static inline enum tree_code
+ gimple_atomic_op_code (const_gimple gs)
+ {
+   gcc_assert (gimple_atomic_kind (gs) == GIMPLE_ATOMIC_FETCH_OP ||
+             gimple_atomic_kind (gs) == GIMPLE_ATOMIC_OP_FETCH);
+   return (enum tree_code) gs->gsbase.subcode;
+ }

now, what was it with this "expression" thing again?  Btw, ||s go to the
next line.  subcode should be the atomic kind - it seems that you glob
too much into GIMPLE_ATOMIC and that you maybe should sub-class
GIMPLE_ATOMIC properly via the subcode field.  You'd have an
gimple_atomic_base which fences could use for example.

All asserts in inline functions should be gcc_checking_asserts.

+ gimple
+ gimple_build_atomic_load (tree type, tree target, tree order)
+ {
+   gimple s = gimple_build_with_ops (GIMPLE_ATOMIC, ERROR_MARK, 3);
+   gimple_atomic_set_kind (s, GIMPLE_ATOMIC_LOAD);
+   gimple_atomic_set_order (s, order);
+   gimple_atomic_set_target (s, target);
+   gimple_atomic_set_type (s, type);
+   gimple_set_has_volatile_ops (s, true);

you should have a gimple_build_atomic_raw function that takes all
operands and the atomic kind, avoiding the need for all the repeated
calls of gimple_atomic_set_* as well as avoid all the repeated checking
this causes.

+   else if (is_gimple_atomic (stmt))
+     {
+       tree t;
+       if (visit_store)
+         {
+         for (i = 0; i < gimple_atomic_num_lhs (stmt); i++)
+           {
+             t = gimple_atomic_lhs (stmt, i);
+             if (t)
+               {
+                 t = get_base_loadstore (t);
+                 if (t)
+                   ret |= visit_store (stmt, t, data);
+               }

I thought results are always registers?  The walk_stmt_load_store_addr_ops
looks wrong to me.  As the loads/stores are implicit (you only have addresses)
you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
atomics specially as they expect to come along all loads/stores that way.
Those also handle calls and asms (for "memory" clobber) specially.

+     case GIMPLE_ATOMIC:
+       /* Atomic operations are memory barriers in both directions for now.  */
+       add_virtual_operand (stmt, opf_def | opf_use);

It's surely not relevant that "in both directions for now" but instead that
"Atomic operations have side-effects on memory."

+       for (n = 0; n < gimple_atomic_num_lhs (stmt); n++)
+       get_expr_operands (stmt, gimple_atomic_lhs_ptr (stmt, n), opf_def);
+       for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+       get_expr_operands (stmt, gimple_op_ptr (stmt, n), opf_use);
+       break;

Do address-takens in operands make the addresses escape?  If not
you should pass opf_non_addressable as well.

Index: tree-ssa-alias.c
*** tree-ssa-alias.c    (revision 186098)
--- tree-ssa-alias.c    (working copy)
*************** ref_maybe_used_by_stmt_p (gimple stmt, t
*** 1440,1445 ****
--- 1440,1447 ----
    else if (is_gimple_call (stmt))
      return ref_maybe_used_by_call_p (stmt, ref);
+   else if (is_gimple_atomic (stmt))
+     return true;

please add a comment before these atomic handlings that we assume
they are using/clobbering all refs because they are considered memory
optimization barriers.  Btw, does this apply to non-address-taken automatic
references?  I suppose not.  Consider:

int foo()
  struct s;
  s.a = 1;
  return s.a;

we still should be able to optimize this to return 1, no?  At least SRA will
happily do similar things in a non-flow-sensitive way.  Please add a FIXME
to the alias predicates at least, or even better fix this missed optimization.
There is no need to establish "backwards missed-optimization compatibility"
just like we do for asms.

*************** stmt_kills_ref_p_1 (gimple stmt, ao_ref
*** 1814,1819 ****
--- 1818,1825 ----

+   if (is_gimple_atomic (stmt))
+     return true;

that's for sure wrong ;)  It should return false.

Index: tree-ssa-sink.c
*** tree-ssa-sink.c     (revision 186098)
--- tree-ssa-sink.c     (working copy)
*************** is_hidden_global_store (gimple stmt)
*** 145,150 ****
--- 145,154 ----
        tree lhs;

+       /* Don't optimize across an atomic operation.  */
+       if (is_gimple_atomic (stmt))
+         return true;

that's bogus, too (really all uses of is_hidden_global_store should go away).
Please look into the few callers of this function and  handle atomics in
a correct way explicitely.

+         else if (is_gimple_atomic (stmt))
+           {
+             unsigned n;
+             /* We may be able to lessen this with more relaxed memory
+                models, but for now, its a full barrier.  */
+             mark_all_reaching_defs_necessary (stmt);
+             for (n = 0; n < gimple_atomic_num_rhs (stmt); n++)
+               {
+                 tree t = gimple_op (stmt, n);
+                 if (TREE_CODE (t) != SSA_NAME &&
+                     TREE_CODE (t) != INTEGER_CST &&
+                     !is_gimple_min_invariant (t) &&
+                     !ref_may_be_aliased (t))
+                   mark_aliased_reaching_defs_necessary (stmt, t);
+               }
+           }

for sure atomics do not make non-aliased automatic variable stores necessary.
At least I hope so.  As there are only addresses in the atomic ops the
code looks wrong to me (and &&s go to the next line ...).  As you are
marking everything needed anway you can just remove the bogus loop

+     case GIMPLE_ATOMIC:
+       /* Treat this like a call for now, it may expand into a call.  */
+       if (gimple_atomic_kind (stmt) != GIMPLE_ATOMIC_FENCE)
+       cost = gimple_num_ops (stmt) *
+              estimate_move_cost (TREE_TYPE (gimple_atomic_target (stmt)));
+       else
+         cost = 1;
+       break;

for sure this counts way too many "ops".  There is at most a single
memory operand as far as I understand.  A size/speed cost of 1 for
a FENCE is also too low.

I miss handling of GIMPLE_ATOMICs in tree-ssa-structalias.c.


> Andrew

Reply via email to