[
https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638831#comment-13638831
]
stack commented on HBASE-7006:
------------------------------
[~jeffreyz] Nice numbers in posted doc.
What does below mean sir?
{code}
+ // make current mutation as a distributed log replay change
+ protected boolean isReplay = false;
{code}
Why we have this isReplay in a Mutation? Because these edits get treated
differently over on serverside?
Suggest calling the data member replay or logReplay or walReplay and then the
accessor is isLogReply or isWALReplay. isReplay is the name of a method that
returns whether the data member replay is true or not.
Does this define belong in this patch?
+ /** Conf key that specifies region assignment timeout value */
+ public static final String REGION_ASSIGNMENT_TIME_OUT =
"hbase.master.region.assignment.time.out";
Why we timing out assignments in this patch?
Is this log splitting that is referred to in the metric name below?
+ void updateMetaSplitTime(long time);
If so, should it be updateMetaWALSplitTime? And given what this patch is
about, should it be WALReplay?
Ditto for updateMetaSplitSize
Excuse me if I am not following what is going on w/ the above (because I see
later that you have replay metrics going on....)
Default is false?
{code}
+ distributedLogReplay =
this.conf.getBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, false);
{code}
Should we turn it on in trunk and off in 0.95? (Should we turn it on in 0.95
so it gets a bit of testing?)
Something wrong w/ license in WALEditsReplaySink
Skimmed the patch. Let me come back w/ a decent review. Looks good J.
> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>
> Key: HBASE-7006
> URL: https://issues.apache.org/jira/browse/HBASE-7006
> Project: HBase
> Issue Type: Bug
> Components: MTTR
> Reporter: stack
> Assignee: Jeffrey Zhong
> Priority: Critical
> Fix For: 0.95.1
>
> Attachments: hbase-7006-combined.patch, LogSplitting Comparison.pdf,
> ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down hard and 30 nodes had
> 1700 WALs to replay. Replay took almost an hour. It looks like it could run
> faster that much of the time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least. Can always punt.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira