Copilot commented on code in PR #12248:
URL: https://github.com/apache/gluten/pull/12248#discussion_r3360523030
##########
cpp/velox/compute/WholeStageResultIterator.cc:
##########
@@ -72,6 +72,19 @@ const std::string kHiveDefaultPartition =
"__HIVE_DEFAULT_PARTITION__";
} // namespace
+namespace {
+std::string getVeloxTaskId(const SparkTaskInfo& taskInfo) {
+ if (taskInfo.executionId != -1) {
+ return fmt::format("Gluten_Execution_{}",
std::to_string(taskInfo.executionId));
+ }
+ return fmt::format(
+ "Gluten_Stage_{}_TID_{}_VTID_{}",
+ std::to_string(taskInfo.stageId),
+ std::to_string(taskInfo.taskId),
+ std::to_string(taskInfo.vId));
+}
+} // namespace
Review Comment:
`getVeloxTaskId` returns only `executionId` when available, which can make
the Velox *task id* non-unique across multiple concurrent Spark tasks in the
same SQL execution. Velox task ids are expected to be unique (e.g.
CudfPlanValidator uses a VTID suffix), so this risks task collisions /
incorrect task reuse / failures.
Suggestion: keep the Velox *task id* always unique (stage/task/vId), and
introduce a separate query id (based on `executionId`) for `QueryCtx` only.
##########
cpp/velox/compute/WholeStageResultIterator.cc:
##########
@@ -233,11 +242,7 @@ std::shared_ptr<velox::core::QueryCtx>
WholeStageResultIterator::createNewVeloxQ
gluten::VeloxBackend::get()->getAsyncDataCache(),
memoryManager_->getAggregateMemoryPool(),
spillExecutor_,
- fmt::format(
- "Gluten_Stage_{}_TID_{}_VTID_{}",
- std::to_string(taskInfo_.stageId),
- std::to_string(taskInfo_.taskId),
- std::to_string(taskInfo_.vId)));
+ getVeloxTaskId(taskInfo_));
return ctx;
Review Comment:
After introducing a distinct query id based on `executionId`,
`QueryCtx::create` should use that query id (and not the unique task id).
Otherwise the query id will no longer reflect `spark.sql.execution.id`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]