[ 
https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357908#comment-14357908
 ] 

Jeff Zhang commented on TEZ-2192:
---------------------------------

Is it possible to do it in AM side ? Don't allow such kind of container-reuse.
I notice that in the ContainerSignatureMatcher we will compare with the first 
container signature. I think if we update the container signature as container 
is reused, we can stop such kind of container reuse with lr conflict. 

{code}
      if (containerSignatureMatcher.isSuperSet(heldContainer
          .getFirstContainerSignature(), cookieContainerRequest.getCookie()
          .getContainerSignature())) {
        if (LOG.isDebugEnabled()) {
          LOG.debug("Matched delayed container to task"
            + " containerId=" + heldContainer.container.getId());
        }
        return true;
      }
{code}

> Relocalization does not check for source
> ----------------------------------------
>
>                 Key: TEZ-2192
>                 URL: https://issues.apache.org/jira/browse/TEZ-2192
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0, 0.5.2
>            Reporter: Rohini Palaniswamy
>            Priority: Blocker
>
>  PIG-4443 spills the input splits to disk if serialized split size is greater 
> than some threshold. It faces issues with relocalization when more than one 
> vertex has job.split file. If a job.split file is already there on container 
> reuse, it is reused causing wrong data to be read.
> Either need a way to turn off relocalization or  check the source+timestamp 
> and redownload the file during relocalization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to