[ https://issues.apache.org/jira/browse/MAPREDUCE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhang Wei updated MAPREDUCE-6283: --------------------------------- Assignee: Varun Saxena (was: Zhang Wei) > MRHistoryServer log files management optimization > ------------------------------------------------- > > Key: MAPREDUCE-6283 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6283 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver > Reporter: Zhang Wei > Assignee: Varun Saxena > Priority: Minor > Original Estimate: 2,016h > Remaining Estimate: 2,016h > > In some heavy computation clusters, user may continually submit lots of jobs, > in our scenario, there are 240k jobs per day. On average, 5 nodes will > participate in running a job. All these job's log file will be aggregated on > the hdfs. That is a big load for namenode. The total number of generated log > files in the default cleaning period (1 week) can be calculated as follows: > AM logs per week: 7 days * 240,000 jobs/day * 2 files/job = 3360,000 files > App logs per week: 7 days * 240,000 jobs/day * 5 nodes/job * 1 file/node = > 8400,000 files > There will be more than 10 million log files generated in one week. Even > worse, some environments have to keep the logs for potential issues tracking > for longer time. In general, these small log files will occupy about 12G heap > size of Namenode, and impact the response speed of Namenode. > For optimizing the log management of history server, the main goals are: > 1) Reduce the total count of files in HDFS. > 2) Compatible with the former history server operation. > As per the goals above, we can mine the detail demands as follows: > 1) Merge log files into bigger ones in HDFS periodically. > 2) Optimized design should inherits from the original architecture to make > the merged logs transparent to be browsed. > 3) Merged logs should be aged periodically just like the common logs. > The whole life cycle of the AM logs: > 1.Created by Application Master in intermediate-done-dir. > 2.Moved to done-dir after the job is done. > 3.Archived to archived-dir periodically. > 4.Cleaned when all the logs in harball are expired. > The whole life cycle of the App logs: > 1.Created by Applications in local-dirs. > 2.Aggregated to remote-app-log-dir after the job is done. > 3.Archived to archived-dir periodically. > 4.Cleaned when all the logs in harball are expired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)