[
https://issues.apache.org/jira/browse/OOZIE-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470099#comment-16470099
]
Andras Piros edited comment on OOZIE-3249 at 5/10/18 9:47 AM:
--------------------------------------------------------------
How I call:
{noformat}
/usr/bin/python2.7 \
/path/to/oozie/tools/src/main/bin/instrumentation-log-parser.py \
-i /path/to/oozie-instrumentation-localhost.log.2018-05-09 \
-o /path/to/oozie-instrumentation-localhost.log.2018-05-09.out \
-p counters/callablequeue.executed/count,counters/callablequeue.queued/count
{noformat}
The logging output is:
{noformat}
Input file is: /path/to/oozie-instrumentation-localhost.log.2018-05-09
Output file is: /path/to/oozie-instrumentation-localhost.log.2018-05-09.out
Paremeters are: ['counters/callablequeue.executed/count',
'counters/callablequeue.queued/count']
[2018-05-10 10:51:06.869892] [INFO] Parsing instrumentation log.
[inputfile=/path/to/oozie-instrumentation-localhost.log.2018-05-09;outputfile=/path/to/oozie-instrumentation-localhost.log.2018-05-09.out;parameters=['counters/callablequeue.executed/count',
'counters/callablequeue.queued/count']]
[2018-05-10 10:51:06.925840] [DEBUG] Input file has 685250 lines
[2018-05-10 10:51:06.928115] [WARN] Unparseable JSON input at line 1199
[2018-05-10 10:51:08.153024] [INFO] Parsing instrumentation log finished. In
total 685250 input lines processed, 261 output lines written
{noformat}
was (Author: andras.piros):
How I call:
{noformat}
/usr/bin/python2.7
/path/to/oozie/tools/src/main/bin/instrumentation-log-parser.py -i
/path/to/oozie-instrumentation-localhost.log.2018-05-09 -o
/path/to/oozie-instrumentation-localhost.log.2018-05-09.out -p
counters/callablequeue.executed/count,counters/callablequeue.queued/count
{noformat}
The logging output is:
{noformat}
Input file is: /path/to/oozie-instrumentation-localhost.log.2018-05-09
Output file is: /path/to/oozie-instrumentation-localhost.log.2018-05-09.out
Paremeters are: ['counters/callablequeue.executed/count',
'counters/callablequeue.queued/count']
[2018-05-10 10:51:06.869892] [INFO] Parsing instrumentation log.
[inputfile=/path/to/oozie-instrumentation-localhost.log.2018-05-09;outputfile=/path/to/oozie-instrumentation-localhost.log.2018-05-09.out;parameters=['counters/callablequeue.executed/count',
'counters/callablequeue.queued/count']]
[2018-05-10 10:51:06.925840] [DEBUG] Input file has 685250 lines
[2018-05-10 10:51:06.928115] [WARN] Unparseable JSON input at line 1199
[2018-05-10 10:51:08.153024] [INFO] Parsing instrumentation log finished. In
total 685250 input lines processed, 261 output lines written
{noformat}
> [tools] Instrumentation log parser
> ----------------------------------
>
> Key: OOZIE-3249
> URL: https://issues.apache.org/jira/browse/OOZIE-3249
> Project: Oozie
> Issue Type: Improvement
> Components: tools
> Affects Versions: 5.0.0
> Reporter: Andras Piros
> Assignee: Andras Piros
> Priority: Major
> Attachments: OOZIE-3249.001.patch,
> oozie-instrumentation-localhost.log.2018-05-09,
> oozie-instrumentation-localhost.log.2018-05-09.out
>
>
> Oozie instrumentation logs contain a lot of information, but are difficult to
> parse, because per instrumentation log entry there is always one header line
> in plain text format (containing timestamp), and multiple other lines in JSON
> format (not containing timestamp). Those lines of course belong together.
> {noformat}
> 2018-05-02 02:48:13,426 INFO oozieinstrumentation:520 - USER[-] GROUP[-]
> TOKEN[-] APP[-] JOB[-] ACTION[-]
> {
> ...
> "counters" : {
> ...
> "callablequeue.executed" : {
> "count" : 5954144
> },
> ...
> "callablequeue.queued" : {
> "count" : 10596129
> },
> ...
> },
> ...
> }
> {noformat}
> There should be a simple script in {{tools/bin}} that takes as parameters:
> * input file name ({{-i}}), e.g. {{-i /path/to/oozie-instrumentation.log}}
> * output file name ({{-o}}), e.g. {{-o
> /path/to/oozie-instrumentation.log.out}}
> * parameters to extract ({{-p}}) in the format of
> {{path/to/json/value1,path/to/json/value2}}, in this case {{-p
> counters/callablequeue.executed/count,counters/callablequeue.queued/count}}
> The output file should contain in CSV format:
> * a header line containing column names for
> * one line per parsed input header / JSON lines, containing:
> ** first cell is the minutes part of the timestamp
> ** consecutive cells are parsed JSON values given each parameter to extract
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)