[ https://issues.apache.org/jira/browse/TEZ-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor reassigned TEZ-4541: --------------------------------- Assignee: László Bodor > Remove or limit evergrowing DAG collections from DAGAppMaster > ------------------------------------------------------------- > > Key: TEZ-4541 > URL: https://issues.apache.org/jira/browse/TEZ-4541 > Project: Apache Tez > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > > TEZ-1495 introduced a dag id collection > ([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248]) > to track all dag ids only to be able to give a different response ( ? ) > https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111 > {code} > if (!currentDAGIdStr.equals(dagIdStr)) { > if (getAllDagIDs().contains(dagIdStr)) { > LOG.debug("Looking for finished dagId {} current dag is {}", > dagIdStr, currentDAGIdStr); > throw new DAGNotRunningException("DAG " + dagIdStr + " Not running, > current dag is " + > currentDAGIdStr); > } else { > LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string > (not found): " + > dagIdStr + ", dagIdObj: " + dagId); > throw new TezException("Unknown dagId: " + dagIdStr); > } > } > {code} > I can see that DAGNotRunningException is used by the DAGClientImpl to handle > edge cases, which is fine, so maybe instead of removing this collection we > might want to limit its size, e.g. to 500, to make DAGAppMaster respond as > expected for a certain amount of time (hence not breaking current contract) -- This message was sent by Atlassian Jira (v8.20.10#820010)