[
https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568576#comment-14568576
]
Lars Hofhansl commented on HBASE-12988:
---------------------------------------
Now the question is: Can we do this without rewriting every single WALEdit?
The test patch I attached here simply splits the list of edits before shipping
it, and ships it in smaller parts to multiple servers in the sink. It does not
rearrange the edits by row or table, and hence may ship edits out of order -
namely deletes and puts for the same row.
Since each WALEdit may contain many cells, and each cell can in theory be for a
different row, we would have disentangle the Cells from the WALEdit and write
them new edits, while retaining the chain clusterIds, table name from the log
key.
In the case of region server failures at the source, edits can already arrive
at the sink out of order, but with this we'd make that the norm rather then an
exception under failure. HBase can delay removal of delete markers already in
order to avoid most races.
We can more easily group by table, since that can be done by just looking at
the HLogKey, but that will be far less efficient... In fact in Abhishek's test
above will make no difference at all, since we're testing against a single
table only - and I expect that will be common.
Grouping by region (which is also in HLogKey) is not safe, since rows can move
between regions (due to splits). Or is it? Since a split is preceded by a flush.
> [Replication]Parallel apply edits on row-level
> ----------------------------------------------
>
> Key: HBASE-12988
> URL: https://issues.apache.org/jira/browse/HBASE-12988
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Reporter: hongyu bi
> Assignee: Lars Hofhansl
> Attachments: ParallelReplication-v2.txt
>
>
> we can apply edits to slave cluster in parallel on table-level to speed up
> replication .
> update : per conversation blow , it's better to apply edits on row-level in
> parallel
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)