tibrewalpratik17 commented on code in PR #12960:
URL: https://github.com/apache/pinot/pull/12960#discussion_r1605423772
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/generator/BaseTaskGenerator.java:
##########
@@ -131,4 +140,44 @@ public void generateTasks(List<TableConfig> tableConfigs,
List<PinotTaskConfig>
public String getMinionInstanceTag(TableConfig tableConfig) {
return TaskGeneratorUtils.extractMinionInstanceTag(tableConfig,
getTaskType());
}
+
+ @Override
+ public boolean isAllowDownloadFromServer(TableConfig tableConfig) {
+ return
TaskGeneratorUtils.extractMinionAllowDownloadFromServer(tableConfig,
getTaskType());
+ }
+
+ public List<URI> getSegmentServerURIs(TableConfig tableConfig, String
segmentName) {
+ String peerDownloadScheme =
tableConfig.getValidationConfig().getPeerSegmentDownloadScheme();
+ List<URI> segmentServerURIs = PeerServerSegmentFinder.getPeerServerURIs(
+
_clusterInfoAccessor.getPinotHelixResourceManager().getHelixZkManager(),
+ tableConfig.getTableName(), segmentName, peerDownloadScheme);
+ Collections.shuffle(segmentServerURIs);
+ return segmentServerURIs;
+ }
+
+ public Map<String, String> getBaseTaskConfigs(TableConfig tableConfig,
List<String> segmentNames) {
+ Map<String, String> baseConfigs = new HashMap<>();
+ baseConfigs.put(MinionConstants.TABLE_NAME_KEY,
tableConfig.getTableName());
+ baseConfigs.put(MinionConstants.SEGMENT_NAME_KEY,
StringUtils.join(segmentNames,
+ MinionConstants.SEGMENT_NAME_SEPARATOR));
+ Map<String, List<String>> segmentServerUriMap = new HashMap<>();
+ if (isAllowDownloadFromServer(tableConfig)) {
+ segmentServerUriMap = segmentNames.stream()
+ .collect(Collectors.toMap(
+ segmentName -> segmentName,
+ segmentName -> getSegmentServerURIs(tableConfig, segmentName)
Review Comment:
This is a good catch!
Going into this if we overload this method `getPeerServerURIs`, we will have
to update `getOnlineServersFromExternalView` to take in a list of segments and
return the result accordingly. But seems we will lose the observability at
segment level (most logs will help find issues at segment level).
As this would be an optimisation mainly for tasks where multiple segments
are ran in one task (e.g. MergeRollupTask) do you think we should extend the
scope here to update `PeerServerSegmentFinder` as well?
Adding more to this, the retry policy in `getPeerServerURIs` is also at
segment level. But with taking in list of segments there, it will lead to a
convoluted logic like if one segment fetch fails, we'd retry for all segments
again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]