[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-6480:
-------------------------------------
    Description: 
MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files.  It 
seeds the initial list of applications to process based on apps which have 
finished aggregated, according to the RM.  However, the RM doesn't remember 
completed applications forever (e.g. failover), so it's possible for the tool 
to miss applications if they're no longer in the RM.  

Instead, we should do the following:
# Seed the initial list of apps based on the aggregated log directories
# Make the RM not consider applications "complete" until their log aggregation 
has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT).  

#2 will allow #1 to assume that any apps not found in the RM are done 
aggregating.  #1 on it's own should cover most cases though

  was:
MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files.  It 
seeds the initial list of applications to process based on apps which have 
finished aggregated, according to the RM.  However, the RM doesn't remember 
completed applications forever (e.g. failover), so it's possible for the tool 
to miss applications if they're no longer in the RM.  

Instead, we should do the following:
# Seed the initial list of apps based on the aggregated log directories
# Make the RM not consider applications "complete" until their log aggregation 
has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, TIME_OUT).  

#2 will allow #1 to assume that any apps not found in the RM are done 
aggregating.  #2 on it's own should cover most cases though


> archive-logs tool may miss applications
> ---------------------------------------
>
>                 Key: MAPREDUCE-6480
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6480
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>
> MAPREDUCE-6415 added a tool to archive aggregated logs into HAR files.  It 
> seeds the initial list of applications to process based on apps which have 
> finished aggregated, according to the RM.  However, the RM doesn't remember 
> completed applications forever (e.g. failover), so it's possible for the tool 
> to miss applications if they're no longer in the RM.  
> Instead, we should do the following:
> # Seed the initial list of apps based on the aggregated log directories
> # Make the RM not consider applications "complete" until their log 
> aggregation has reached a terminal state (i.e. DISABLED, SUCCEEDED, FAILED, 
> TIME_OUT).  
> #2 will allow #1 to assume that any apps not found in the RM are done 
> aggregating.  #1 on it's own should cover most cases though



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to