github-actions[bot] commented on code in PR #64964:
URL: https://github.com/apache/doris/pull/64964#discussion_r3492405800
##########
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/ThriftPlansBuilder.java:
##########
@@ -844,6 +833,20 @@ private static Set<Integer>
setParamsForRecursiveCteNode(List<PipelineDistribute
return fragmentToNotifyClose;
}
+ static List<TRecCTETarget> buildRecCTETargets(List<AssignedJob>
fragmentAssignedJobs,
+ RecursiveCteScanNode recursiveCteScanNode) {
+ List<TRecCTETarget> targets = new
ArrayList<>(fragmentAssignedJobs.size());
+ for (AssignedJob assignedJob : fragmentAssignedJobs) {
Review Comment:
This now includes every assigned job as a recursive CTE target, but that
over-targets serial-source/local-shuffle fragments. When
`ignore_storage_data_distribution` makes the recursive side use
`LocalShuffleAssignedJob`s, `instanceToThrift()` sets `parallel_instances=1`;
on BE the recursive scan is a serial operator, so
`_build_pipeline_tasks_for_instance()` only builds the one-task scan pipeline
for `instance_idx == 0`. `RecCTEScanLocalState` therefore registers only that
fragment instance id, and `send_block_to_cte_scan()` will return `RecCTEScan
not found` for the extra local-shuffle ids emitted here. The old address-dedup
behavior accidentally selected the one receiver per BE that exists in this
mode. Please keep the per-instance targets for normal parallel scans, but
filter/collapse local-shuffle groups to the instance that actually owns the
recursive scan task, and cover the `ignore_storage_data_distribution=true`
local-shuffle recursive-side path.
##########
regression-test/suites/recursive_cte/rec_cte_parallel_targets.groovy:
##########
@@ -0,0 +1,69 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("rec_cte_parallel_targets", "rec_cte") {
+ sql """set enable_local_shuffle=true"""
+ sql """set enable_local_shuffle_planner=true"""
+ sql """set parallel_pipeline_task_num=4"""
+
+ sql """drop table if exists rec_cte_parallel_targets"""
+ sql """
+ create table rec_cte_parallel_targets (
+ id int,
+ parent_id int,
+ dept_name varchar(50),
+ budget decimal(18, 2)
+ )
+ duplicate key(id)
+ distributed by hash(id) buckets 4
+ properties ("replication_num" = "1")
+ """
+
+ sql """
+ insert into rec_cte_parallel_targets values
+ (1, null, 'headquarters', 10000.00),
+ (10, 1, 'r_and_d', 5000.00),
+ (11, 1, 'marketing', 4000.00),
+ (101, 10, 'backend', 2000.00),
+ (102, 10, 'frontend', 1500.00),
+ (111, 11, 'promotion', 2000.00)
+ """
+
+ order_qt_rec_cte_parallel_targets """
Review Comment:
This `order_qt_` block needs a checked-in expected-output file. In normal
regression mode, `order_qt_rec_cte_parallel_targets` becomes the tag
`rec_cte_parallel_targets`, and `SuiteContext` derives the expected file as
`regression-test/data/recursive_cte/rec_cte_parallel_targets.out`. Since this
PR only adds the `.groovy` suite and not that `.out` file, `Suite.groovy` will
throw `Missing outputFile`/`Missing output block` before the query can validate
the recursive CTE fix. Please add the generated `.out` file for this suite, or
use a test helper that does not require a qt output block.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]