[ 
https://issues.apache.org/jira/browse/OOZIE-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807533#comment-16807533
 ] 

Andras Piros commented on OOZIE-3249:
-------------------------------------

Thanks for the review [~asalamon74]! Uploaded patch 002 that addresses your 
comments.

> [tools] Instrumentation log parser
> ----------------------------------
>
>                 Key: OOZIE-3249
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3249
>             Project: Oozie
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 5.0.0
>            Reporter: Andras Piros
>            Assignee: Andras Piros
>            Priority: Major
>         Attachments: OOZIE-3249.001.patch, OOZIE-3249.002.patch, 
> oozie-instrumentation-localhost.log.2018-05-09, 
> oozie-instrumentation-localhost.log.2018-05-09.out
>
>
> Oozie instrumentation logs contain a lot of information, but are difficult to 
> parse, because per instrumentation log entry there is always one header line 
> in plain text format (containing timestamp), and multiple other lines in JSON 
> format (not containing timestamp). Those lines of course belong together.
> {noformat}
> 2018-05-02 02:48:13,426  INFO oozieinstrumentation:520 - USER[-] GROUP[-] 
> TOKEN[-] APP[-] JOB[-] ACTION[-] 
> {
> ...
>   "counters" : {
> ...
>     "callablequeue.executed" : {
>       "count" : 5954144
>     },
> ...
>     "callablequeue.queued" : {
>       "count" : 10596129
>     },
> ...
>   },
> ...
> }
> {noformat}
> There should be a simple script in {{tools/bin}} that takes as parameters:
> * input file name ({{-i}}), e.g. {{-i /path/to/oozie-instrumentation.log}}
> * output file name ({{-o}}), e.g. {{-o 
> /path/to/oozie-instrumentation.log.out}}
> * parameters to extract ({{-p}}) in the format of 
> {{path/to/json/value1,path/to/json/value2}}, in this case {{-p 
> counters/callablequeue.executed/count,counters/callablequeue.queued/count}}
> The output file should contain in CSV format:
> * a header line containing column names for
> * one line per parsed input header / JSON lines, containing:
> ** first cell is the minutes part of the timestamp
> ** consecutive cells are parsed JSON values given each parameter to extract



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to