>From Ali Alsuliman <[email protected]>: Attention is currently required from: Ali Alsuliman, Ian Maxon, Michael Blow.
Ali Alsuliman has posted comments on this change by Ian Maxon. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email ) Change subject: [ASTERIXDB-3631][RT] Profile nested groupby clauses ...................................................................... Patch Set 13: (14 comments) File hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/jobgen/impl/JobBuilder.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/182642f1_d66e2abc?usp=email : PS13, Line 193: List<Integer> paths = new ArrayList<>(inputs.size()); I am having a hard time following the logic for `paths` and `Collections.max(paths)` and `prevOpIndex + 1`. Is this so that we match how the push runtime is constructing the ids? What if in `getExtendedOdidForNestedOp()`, we just pass the "baseId" to `getExtendedOdidForOperator()` as: ``` int j = 0; for (Mutable<ILogicalOperator> root : plan.getRoots()) { getExtendedOdidForOperator(baseId + "." + i, root.getValue(), log2phys, j); j++; } ``` And then in `getExtendedOdidForOperator()` just continue building: ``` for (int i = 0; i < inputs.size(); i++) { if (nextOp instanceof AbstractOperatorWithNestedPlans) { getExtendedOdidForNestedOp((AbstractOperatorWithNestedPlans) nextOp, log2phys, baseId + "." + input + "." + i); } } ``` `getExtendedOdidForNestedOp()` won't need to return int. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/27120f3e_b838e81c?usp=email : PS13, Line 199: paths.isEmpty() Can "paths.isEmpty()" ever be empty? seems like it should always have something. File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/AlgebricksMetaOperatorDescriptor.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/7c172a12_e57be5b3?usp=email : PS13, Line 105: subIds.length() > 2 Is this reliable to determine it's Subplan? I didn't follow the logic, but it feels like we are assuming that there is only single digits between the "." Is that the case? https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/eacd5f54_4f46d383?usp=email : PS13, Line 140: public Does `traverseFactories()` need to be `public` or is it just used here and can be made `private`? https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/4e6eacfe_c8e46487?usp=email : PS13, Line 153: else Previously, we were always adding: `microOpStats.put(fact, makeStatForRuntimeFact`. Now, it's in the `else` block. What changed? just trying to understand. File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/PipelineAssembler.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/243d43d4_ec7c3163?usp=email : PS13, Line 172: profiling(ctx) You already have `boolean profile` above. You can replace `ctx.getJobFlags().contains(JobFlag.PROFILE_RUNTIME)` with `profiling(ctx)` and re-use the boolean variable. File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/meta/SubplanRuntimeFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/b3ba88c6_0ad8ec04?usp=email : PS13, Line 138: private final boolean profile; Seems like "profile" can be just a local variable instead of instance variable File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/std/NestedTupleSourceRuntimeFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/aa4052d8_f120c424?usp=email : PS13, Line 77: INestedTupleSourceRuntime nestedWrapped; Make it `private final` if we can. File hyracks-fullstack/algebricks/algebricks-runtime/src/main/java/org/apache/hyracks/algebricks/runtime/operators/win/WindowAggregatorDescriptorFactory.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/090edb2c_597d2de2?usp=email : PS13, Line 78: .assemblePipeline(subplan, outputWriter, ctx, pipelineRuntimeMap, Collections.emptyMap()).getLeft() Seems like you are using `Collections.emptyMap()` to indicate non-profiling. We should probably have a separate method: `PipelineAssembler.assembleNonProfiledPipeline()`. We can do that later. File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/dataflow/ProfiledFrameWriter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/54f95f06_218d6eb5?usp=email : PS13, Line 48: protected I haven't checked. Is this change to `protected` intended or unintended? File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-common/src/main/java/org/apache/hyracks/control/common/job/profiling/StatsCollector.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f6320113_dbf4182f?usp=email : PS13, Line 85: output.writeInt(operatorStatsMap.size()); : for (IOperatorStats stat : operatorStatsMap.values()) { : stat.writeFields(output); : } Remind me to come back to this and whether `operatorStatsMap` needs to be synchronized or not. File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupOperatorNodePushable.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/0455f2ed_b070f0d9?usp=email : PS13, Line 36: PreclusteredGroupOperatorNodePushable Just checking, when profiling, does this nodepushable become "ProfiledOperatorNodePushable"? how is its time captured and conveyed back given that it's also doing profiling for the aggregator. Probably the same question applies to `AlgebricksMetaOperatorDescriptor` File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/group/preclustered/PreclusteredGroupWriter.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/f9f4c75a_d53d0904?usp=email : PS13, Line 66: private Runnable timingHandle = Function::identity; We can just simplify and remove this instance variable. We can just do ((IProfiledAggregatorDescriptor) aggregator).computeTimings() File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/util/ProfilingUtils.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270/comment/dafeab66_df0bd141?usp=email : PS13, Line 27: ctx != null Can `ctx != null` ever be null? I don't think so. We can remove this check. -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20270?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: comment Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Change-Id: I516dc90b8da3a7086dc80b67946ac14f6ade0973 Gerrit-Change-Number: 20270 Gerrit-PatchSet: 13 Gerrit-Owner: Ian Maxon <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-CC: Michael Blow <[email protected]> Gerrit-Attention: Ian Maxon <[email protected]> Gerrit-Attention: Ali Alsuliman <[email protected]> Gerrit-Attention: Michael Blow <[email protected]> Gerrit-Comment-Date: Wed, 28 Jan 2026 18:23:57 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No
