protze.joachim added inline comments.
================ Comment at: openmp/runtime/src/kmp_taskdeps.cpp:344-346 + // link node as successor of all nodes in the prev_set if any + npredecessors += + __kmp_depnode_link_successor(gtid, thread, task, node, prev_set); ---------------- AndreyChurbanov wrote: > protze.joachim wrote: > > AndreyChurbanov wrote: > > > protze.joachim wrote: > > > > If I understand this code right, this has O(n^2) complexity for two > > > > sets of size n? > > > > > > > > Consider: > > > > ``` > > > > for (int i=0; i<n; i++) { > > > > #pragma omp task depend(in:A) > > > > work(A,i); > > > > } > > > > for (int i=0; i<n; i++) { > > > > #pragma omp task depend(inoutset:A) > > > > work(A,i); > > > > } > > > > ``` > > > > All n tasks in the first loop would be predecessor for each of the > > > > tasks in the second loop. This will also result in n^2 > > > > releases/notifications. > > > > > > > > > > > > I'd suggest to model the dependencies like: > > > > ``` > > > > for (int i=0; i<n; i++) { > > > > #pragma omp task depend(in:A) > > > > work(A,i); > > > > } > > > > #pragma omp taskwait depend(inout:A) nowait > > > > for (int i=0; i<n; i++) { > > > > #pragma omp task depend(inoutset:A) > > > > work(A,i); > > > > } > > > > ``` > > > > I.e., add a dummy dependency node between the two sets, which has n > > > > predecessors and n successors. You probably understand the depnode code > > > > better, than I do, but I think you would only need to add some code in > > > > line 357, where `last_set` is moved to `prev_set`. > > > > This dummy node would reduce linking and releasing complexity to O(n). > > > This could be a separate research project. Because such change may cause > > > performance regressions on some real codes, so it needs thorough > > > investigation. I mean insertion of an auxiliary dummy node into > > > dependency graph is definitely an overhead, which may or may not be > > > overridden by reduction in number of edges in the graph. Or it can be > > > made optional optimization under an external control, if there is a > > > significant gain in some particular case. > > I don't suggest to insert the auxiliary node in the general case. Just for > > the case that you see a transition of `in` -> `inoutset` or vice versa. > > > > With current task dependencies, you always have 1 node notifying n nodes > > (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see > > this as an amortized O(1) operation per task in the graph. > > > > Here you introduce a code pattern, where n nodes each will need to notify m > > other nodes. This leads to an O(n) operation per task. I'm really worried > > about the complexity of this pattern. > > If there is no use case with large n, I don't understand, why `inoutset` > > was introduced in the first place. > > I don't suggest to insert the auxiliary node in the general case. Just for > > the case that you see a transition of `in` -> `inoutset` or vice versa. > > > > With current task dependencies, you always have 1 node notifying n nodes > > (`out` -> `in`) or n nodes notifying one node (`in` -> `out`). You can see > > this as an amortized O(1) operation per task in the graph. > > > > Here you introduce a code pattern, where n nodes each will need to notify m > > other nodes. This leads to an O(n) operation per task. I'm really worried > > about the complexity of this pattern. > No, I don't introduce the pattern, as it already worked for sets of tasks > with `in` dependency following set of tasks with `mutexinoutset` dependency > or vice versa. > > > If there is no use case with large n, I don't understand, why `inoutset` > > was introduced in the first place. > I didn't like that `inoutset` was introduced as the clone of "in" in order to > separate two (big) sets of mutually independent tasks, but this has already > been added to specification, so we have to implement it. Of cause your > suggestion can dramatically speed up dependency processing of some trivial > case with two huge sets of tasks one wit `in` dependency another with > `inoutset`; but it slightly changes the semantics of user's code, and > actually can slowdown other cases. I would let users do such optimizations. > Or compiler can do this once it sees big sets of tasks those don't have > barrier-like "taskwait nowait". For user this is one line of code, for > compiler this is dozen lines of code, for runtime library this is pretty big > change which does not worth efforts to implement, I think. Especially given > that such "optimization" will definitely slowdown some cases. E.g. small set > of tasks with `in` dependency can be followed single task with `inoutset` in > a loop. Why should we slowdown this case? Library has no idea of the > structure of user's code. > > In general, I don't like the idea when runtime library tries to optimize > user's code. Especially along with other changes in the same patch. > > No, I don't introduce the pattern, as it already worked for sets of tasks > with `in` dependency following set of tasks with `mutexinoutset` dependency > or vice versa. Ok, thanks for the clarification. I missed that `mutexinoutset` has the same issue. > In general, I don't like the idea when runtime library tries to optimize > user's code. Especially along with other changes in the same patch. I get your point. Ok, then. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D97085/new/ https://reviews.llvm.org/D97085 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits