On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> During the work I ran into a latent bug for distributing. For the moment >>> we sort statements >>> in dominance order, but that's not enough because basic blocks may be >>> sorted in reverse order >>> of execution flow. This results in wrong data dependence direction later. >>> This patch fixes >>> the issue by sorting in topological order. >>> >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> I suppose you are fixing >> >> static int >> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >> vec<data_reference_p> drs1, >> vec<data_reference_p> drs2) >> { >> ... >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> >> but then for stmts that are not "ordered" by RPO or DOM like >> >> if (flag) >> ... = dr1; >> else >> ... = dr2; >> >> this doesn't avoid spurious swaps? Also the code was basically > No, this is mainly for below case: > if (flag) > { > partition1: arr[i] = x; > } > partition2: arr[i] = y; > > function pg_add_dependence_edges is like: > /* Re-shuffle data-refs to be in dominator order. */ > if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > //... > else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) > { > if (DDR_REVERSED_P (ddr)) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > /* Known dependences can still be unordered througout the > iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ > if (DDR_NUM_DIST_VECTS (ddr) != 1) > this_dir = 2; > /* If the overlap is exact preserve stmt order. */ > else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) > ; > else > { > /* Else as the distance vector is lexicographic positive > swap the dependence direction. */ > this_dir = -this_dir; > } > } > For non-ZERO distance vector dependence, the second part always > computes src->dest dependence info correctly, as well as edge > direction of PG. For ZERO distance vector dependence, we rely on the > swap part (thus topological order) to get correct dependence > direction. For mentioned case, the two data references are unordered > in dominance relation, but ordered in RPO. This is why DOM is not > enough. For if-then-else case, the order actually doesn't matter, and > the references are unordered in either dominance relation or RPO. > Specifically, src->dest is always computed correctly for non-ZERO > distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to > data dependence analyzer. As for ZERO distance vector (exact > overlap), the order doesn't matter either because they control > dependent on the same condition. We can simply assume an arbitrary > order for it.
Ok, if it only is an issue for zero-distance then yes, I agree. >> copied from tree-data-refs.c:find_data_references_in_loop which >> does iterate over get_loop_body_in_dom_order as well. So isn't the >> issue latent there as well? > In theory maybe. In practice, this is not a problem at all since loop > distribution is the only one handles control dependence so far. You mean the only one running into the bogus BB ordering case. I don't see how handling control dependences factor in here. >> >> That said, what about those "unordered" stmts? I suppose >> dependence analysis still happily computes a dependence >> distance but in reality we'd have to consider both execution >> orders? > As explained, there is no need to consider both orders. GCC doesn't > really support control dependence parallelization? I think autopar supports an arbitrary CFG inside the loops but as it will never split them it won't change stmt ordering for zero-distance. That said, if dependence info can be incorrect if applied to a loop we should fixup tree-data-refs.c as well. It might be useful to make get_loop_body_in_rpo_order available then (and eventually all _in_dom_order users can work with rpo order as well thus we can replace it entirely as a second step). Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >> >>> >>> Thanks, >>> bin >>> 2017-06-07 Bin Cheng <bin.ch...@arm.com> >>> >>> * tree-loop-distribution.c (bb_top_order_index): New. >>> (bb_top_order_index_size, bb_top_order_cmp): New. >>> (stmts_from_loop): Use topological order. >>> (pass_loop_distribution::execute): Compute topological order for. >>> basic blocks.