[
https://issues.apache.org/jira/browse/HBASE-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-24380:
----------------------------------------
Description:
Looking to reconstruct a timeline from write of recovered.edits file back to
start of WAL file split, with a bunch of unrelated activity in the meantime,
there isn't a consistent token that links split file write messages (which
include store path including region hash) to beginning of WAL splitting
activity. Sessonizing by host doesn't work because work can bounce around
through retries. Thread context names in the logs vary and can be like
[nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region
hash) or [splits-1589016325868] .
We could have WALSplitter get the current time when starting the split of a WAL
file and have it log this timestamp in every line as a splitting session
identifier.
Related, we should track the time of split task execution end to end and export
a metric that captures it.
It might also be worthwhile to wire up more of WAL splitting to TaskMonitor
status logging. If we do this we can also enable status journal logging, so
when a WAL split has completed, a line will appear in the log that has the list
of all status messages recorded during splitting and the time delta in
milliseconds between them.
was:
Looking to reconstruct a timeline from write of recovered.edits file back to
start of WAL file split, with a bunch of unrelated activity in the meantime,
there isn't a consistent token that links split file write messages (which
include store path including region hash) to beginning of WAL splitting
activity. Sessonizing by host doesn't work because work can bounce around
through retries. Thread context names in the logs vary and can be like
[nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region
hash) or [splits-1589016325868] .
We could have WALSplitter get the current time when starting the split of a WAL
file and have it log this timestamp in every line as a splitting session
identifier.
Related, we should track the time of split task execution end to end and export
a metric that captures it.
It might also be worthwhile to wire up more of WAL splitting to TaskMonitor
status logging. If we do this we can also enable status journal logging, so
when splitting is down, a line will appear in the log that has the list of all
status messages recorded during splitting and the time delta in milliseconds
between them.
> Improve WAL splitting log lines to enable sessionization
> --------------------------------------------------------
>
> Key: HBASE-24380
> URL: https://issues.apache.org/jira/browse/HBASE-24380
> Project: HBase
> Issue Type: Improvement
> Components: logging, Operability, wal
> Reporter: Andrew Kyle Purtell
> Priority: Minor
>
> Looking to reconstruct a timeline from write of recovered.edits file back to
> start of WAL file split, with a bunch of unrelated activity in the meantime,
> there isn't a consistent token that links split file write messages (which
> include store path including region hash) to beginning of WAL splitting
> activity. Sessonizing by host doesn't work because work can bounce around
> through retries. Thread context names in the logs vary and can be like
> [nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region
> hash) or [splits-1589016325868] .
> We could have WALSplitter get the current time when starting the split of a
> WAL file and have it log this timestamp in every line as a splitting session
> identifier.
> Related, we should track the time of split task execution end to end and
> export a metric that captures it.
> It might also be worthwhile to wire up more of WAL splitting to TaskMonitor
> status logging. If we do this we can also enable status journal logging, so
> when a WAL split has completed, a line will appear in the log that has the
> list of all status messages recorded during splitting and the time delta in
> milliseconds between them.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)