[
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758637#comment-16758637
]
Hadoop QA commented on HBASE-21505:
-----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color}
| {color:red} HBASE-21505 does not apply to master. Rebase required? Wrong
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21505 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12957311/HBASE-21505-master.006.patch
|
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/15848/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Several inconsistencies on information reported for Replication Sources by
> hbase shell status 'replication' command.
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
> Issue Type: Bug
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Major
> Attachments:
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch,
> HBASE-21505-master.001.patch, HBASE-21505-master.002.patch,
> HBASE-21505-master.003.patch, HBASE-21505-master.004.patch,
> HBASE-21505-master.005.patch, HBASE-21505-master.006.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when
> no new edits were added to source, so nothing was really shipped. Test steps
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp
> is wrongly showing same value as "Replication Lag". This is incorrect,
> AgeOfLastShippedOp should not change until there's indeed another edit
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even
> though there's no new edit shipped to target:
> {noformat}
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source.
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No
> new edit added, but status 'replication' command reports AgeOfLastShippedOp
> as 0, while it should be the diff between the time it concluded shipping at
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target
> unavailability, if RS is restarted, once recovered queue source is started,
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00
> GMT 1970, for example), thus "Replication Lag" also gives a complete
> inaccurate value.
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1,
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format
> gives the impression there’s only one, even when there are recovery queues,
> for instance.
> Here is a list of ideas on how the command should report under different
> states of replication:
> a) Source started, target stopped, no edits arrived on source yet:
> Status replication should not show any lags, no edits shipped, no edits
> arrived;
> b) Source started, target stopped, add edit on source:
> Status replication should report following info -> lag, time of edit arrival
> on source, additional message saying no edits had been shipped to target;
> c) Source started, target stopped, edit added on source, restart source:
> Status replication should list two sources, one normal, other recovered.
> Normal source should show no lags, no edits shipped, no edits arrived.
> Recovered should show no edits shipped, but should have edits arrived in
> source and lag > 0;
> d) Source started, target stopped, add edit on source, restart source, add
> another edit on source:
> Status replication should list two sources, one normal, other recovered. Both
> sources should show no edits shipped, but should have edits arrived in source
> and lag > 0;
> e) Source started, target stopped, add edit on source, restart source, add
> another edit on source, start target:
> Status replication should list normal source only (after some short period),
> with proper times for last shipped, last arrived in source and no replication
> lag.
> f) Source started, target stopped, add edit on source, restart source,
> restart target:
> Status replication should list normal source only, with no shipped, nor
> arrived edits, and lag should be 0;
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)