protze.joachim added inline comments.

================
Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346
+        // link node as successor of all nodes in the prev_set if any
+        npredecessors +=
+            __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
----------------
AndreyChurbanov wrote:
> protze.joachim wrote:
> > AndreyChurbanov wrote:
> > > protze.joachim wrote:
> > > > If I understand this code right, this has O(n^2) complexity for two 
> > > > sets of size n?
> > > > 
> > > > Consider:
> > > > ```
> > > > for (int i=0; i<n; i++) {
> > > > #pragma omp task depend(in:A)
> > > > work(A,i);
> > > > }
> > > > for (int i=0; i<n; i++) {
> > > > #pragma omp task depend(inoutset:A)
> > > > work(A,i);
> > > > }
> > > > ```
> > > > All n tasks in the first loop would be predecessor for each of the 
> > > > tasks in the second loop. This will also result in n^2 
> > > > releases/notifications.
> > > > 
> > > > 
> > > > I'd suggest to model the dependencies like:
> > > > ```
> > > > for (int i=0; i<n; i++) {
> > > > #pragma omp task depend(in:A)
> > > > work(A,i);
> > > > }
> > > > #pragma omp taskwait depend(inout:A) nowait
> > > > for (int i=0; i<n; i++) {
> > > > #pragma omp task depend(inoutset:A)
> > > > work(A,i);
> > > > }
> > > > ```
> > > > I.e., add a dummy dependency node between the two sets, which has n 
> > > > predecessors and n successors. You probably understand the depnode code 
> > > > better, than I do, but I think you would only need to add some code in 
> > > > line 357, where `last_set` is moved to `prev_set`.
> > > > This dummy node would reduce linking and releasing complexity to O(n).
> > > This could be a separate research project.  Because such change may cause 
> > > performance regressions on some real codes, so it needs thorough 
> > > investigation.  I mean insertion of an auxiliary dummy node into 
> > > dependency graph is definitely an overhead, which may or may not be 
> > > overridden by reduction in number of edges in the graph.  Or it can be 
> > > made optional optimization under an external control, if there is a 
> > > significant gain in some particular case.
> > I don't suggest to insert the auxiliary node in the general case. Just for 
> > the case that you see a transition of `in` -> `inoutset` or vice versa.
> > 
> > With current task dependencies, you always have 1 node notifying n nodes 
> > (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> > this as an amortized O(1) operation per task in the graph.
> > 
> > Here you introduce a code pattern, where n nodes each will need to notify m 
> > other nodes. This leads to an O(n) operation per task. I'm really worried 
> > about the complexity of this pattern.
> > If there is no use case with large n, I don't understand, why `inoutset` 
> > was introduced in the first place.
> > I don't suggest to insert the auxiliary node in the general case. Just for 
> > the case that you see a transition of `in` -> `inoutset` or vice versa.
> > 
> > With current task dependencies, you always have 1 node notifying n nodes 
> > (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see 
> > this as an amortized O(1) operation per task in the graph.
> > 
> > Here you introduce a code pattern, where n nodes each will need to notify m 
> > other nodes. This leads to an O(n) operation per task. I'm really worried 
> > about the complexity of this pattern.
> No, I don't introduce the pattern, as it already worked for sets of tasks 
> with `in` dependency following set of tasks with `mutexinoutset` dependency 
> or vice versa.
> 
> > If there is no use case with large n, I don't understand, why `inoutset` 
> > was introduced in the first place.
> I didn't like that `inoutset` was introduced as the clone of "in" in order to 
> separate two (big) sets of mutually independent tasks, but this has already 
> been added to specification, so we have to implement it. Of cause your 
> suggestion can dramatically speed up dependency processing of some trivial 
> case with two huge sets of tasks one wit `in` dependency another with 
> `inoutset`; but it slightly changes the semantics of user's code, and 
> actually can slowdown other cases.  I would let users do such optimizations.  
> Or compiler can do this once it sees big sets of tasks those don't have 
> barrier-like "taskwait nowait".  For user this is one line of code, for 
> compiler this is dozen lines of code, for runtime library this is pretty big 
> change which does not worth efforts to implement, I think.  Especially given 
> that such "optimization" will definitely slowdown some cases.  E.g. small set 
> of tasks with `in` dependency can be followed single task with `inoutset` in 
> a loop. Why should we slowdown this case?  Library has no idea of the 
> structure of user's code.
> 
> In general, I don't like the idea when runtime library tries to optimize 
> user's code.  Especially along with other changes in the same patch.
> 
> No, I don't introduce the pattern, as it already worked for sets of tasks 
> with `in` dependency following set of tasks with `mutexinoutset` dependency 
> or vice versa.

Ok, thanks for the clarification. I missed that `mutexinoutset` has the same 
issue.

> In general, I don't like the idea when runtime library tries to optimize 
> user's code.  Especially along with other changes in the same patch.

I get your point. Ok, then.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97085/new/

https://reviews.llvm.org/D97085

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to