[ https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357908#comment-14357908 ]
Jeff Zhang commented on TEZ-2192: --------------------------------- Is it possible to do it in AM side ? Don't allow such kind of container-reuse. I notice that in the ContainerSignatureMatcher we will compare with the first container signature. I think if we update the container signature as container is reused, we can stop such kind of container reuse with lr conflict. {code} if (containerSignatureMatcher.isSuperSet(heldContainer .getFirstContainerSignature(), cookieContainerRequest.getCookie() .getContainerSignature())) { if (LOG.isDebugEnabled()) { LOG.debug("Matched delayed container to task" + " containerId=" + heldContainer.container.getId()); } return true; } {code} > Relocalization does not check for source > ---------------------------------------- > > Key: TEZ-2192 > URL: https://issues.apache.org/jira/browse/TEZ-2192 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0, 0.5.2 > Reporter: Rohini Palaniswamy > Priority: Blocker > > PIG-4443 spills the input splits to disk if serialized split size is greater > than some threshold. It faces issues with relocalization when more than one > vertex has job.split file. If a job.split file is already there on container > reuse, it is reused causing wrong data to be read. > Either need a way to turn off relocalization or check the source+timestamp > and redownload the file during relocalization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)