[ 
https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664112#action_12664112
 ] 

Vinod K V commented on HADOOP-5022:
-----------------------------------

The patch looks good, seems to complete its requirements. I don't have a 
cluster set up with old logs, so couldn't test it live.

Some minor code comments w.r.t logcondense.py:
 - Line +100 : the dictionary related to the newly added option can be 
indented. It won't give a compilation problem now, but it can be indented 
nevertheless for better readability
 - The option name can be something in the lines of "retain-masters-logs" 
instead of "cleanall". It should be true by default (current scenario). The 
description string can be something like "true if the logs of the 
masters(jobtracker and namenode if dynamicdfs is set) have to be retained, 
false if everything has to be removed"
 - Line +191 : ret = 0 not needed
 - Line +135 : spurious blank line to be removed

Another point to be noted is that the code path covered by this patch could 
have been much simpler had the job-specific directories(present in hod-logs 
parent directory) themselves had timestamp in their names. If that were the 
case, the output needed from dfs as well as the number of filenames processed 
would have been much lesser. But, this needs changes in hod and so the fix in 
the provided patch should suffice for now.

> [HOD] logcondense should delete all hod logs for a user, including jobtracker 
> logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to 
> the DFS when the HOD cluster is deallocated. This will result in the hod-logs 
> directory to slowly accumulate a whole bunch of jobtracker logs. Particularly 
> for users who run a lot of user jobs, this could fill up the namespace.  
> Further these directories will cause the logcondense program to keep 
> repeatedly looking at these directories stressing out the namenode. So, 
> logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to