[
https://issues.apache.org/jira/browse/APEXCORE-393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207511#comment-15207511
]
ASF GitHub Bot commented on APEXCORE-393:
-----------------------------------------
Github user tweise commented on a diff in the pull request:
https://github.com/apache/incubator-apex-core/pull/274#discussion_r57087518
--- Diff:
engine/src/main/java/com/datatorrent/stram/StreamingAppMasterService.java ---
@@ -781,18 +794,19 @@ private void execute() throws YarnException,
IOException
/* Remove nodes from blacklist after timeout */
long currentTime = System.currentTimeMillis();
List<String> blacklistRemovals = new ArrayList<String>();
- for (Iterator<Pair<Long, List<String>>> it =
blacklistedNodesQueueWithTimeStamp.iterator(); it.hasNext();) {
- Pair<Long, List<String>> entry = it.next();
- Long timeDiff = currentTime - entry.getFirst();
- if (timeDiff > blacklistRemovalTime) {
- blacklistRemovals.addAll(entry.getSecond());
- it.remove();
- } else {
- break;
+ for (String hostname : failedBlackListedNodes) {
+ Long timeDiff = currentTime -
failedContainerNodesMap.get(hostname).blackListAdditionTime;
+ if (timeDiff >= blacklistRemovalTime) {
+ blacklistRemovals.add(hostname);
+ failedContainerNodesMap.remove(hostname);
}
}
+
if (!blacklistRemovals.isEmpty()) {
amRmClient.updateBlacklist(null, blacklistRemovals);
+ LOG.info("Removing nodes {} from blacklist: time elapsed since
last blacklisting due to failure is greater than specified timeout",
blacklistRemovals.toString());
+
--- End diff --
Minor nit, there are a few extra blank lines added...
> Reset failure count when consecutive failed node is removed from blacklist
> --------------------------------------------------------------------------
>
> Key: APEXCORE-393
> URL: https://issues.apache.org/jira/browse/APEXCORE-393
> Project: Apache Apex Core
> Issue Type: Bug
> Reporter: Isha Arkatkar
> Assignee: Isha Arkatkar
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)