Design/Implement a tool to support archival and analysis of logfiles.
---------------------------------------------------------------------

         Key: HADOOP-342
         URL: http://issues.apache.org/jira/browse/HADOOP-342
     Project: Hadoop
        Type: New Feature

    Reporter: Arun C Murthy


Requirements:

  a) Create a tool support archival of logfiles (from diverse sources) in 
hadoop's dfs.
  b) The tool should also support analysis of the logfiles via grep/sort 
primitives. The tool should allow for fairly generic pattern 'grep's and let 
users 'sort' the matching lines (from grep) on 'columns' of their choice.

  E.g. from hadoop logs: Look for all log-lines with 'FATAL' and sort them 
based on timestamps (column x)  and then on column y (column x, followed by 
column y).


Design/Implementation:

  a) Log Archival

    Archival of logs from diverse sources can be accomplished using the 
*distcp* tool (HADOOP-341).
  
  b) Log analysis

    The idea is to enable users of the tool to perform analysis of logs via 
grep/sort primitives.

    This can be accomplished via a relatively simple Map-Reduce task where the 
map does the *grep* for the given pattern via RegexMapper and then the implicit 
*sort* (reducer) is used with a custom Comparator which performs the 
user-specified comparision (columns). 

    The sort/grep specs can be fairly powerful by letting the user of the tool 
use java's in-built regex patterns (java.util.regex).


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to