[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits on row-level

Lars Hofhansl (JIRA) Mon, 01 Jun 2015 22:53:20 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568576#comment-14568576
 ]


Lars Hofhansl commented on HBASE-12988:
---------------------------------------

Now the question is: Can we do this without rewriting every single WALEdit?

The test patch I attached here simply splits the list of edits before shipping 
it, and ships it in smaller parts to multiple servers in the sink. It does not 
rearrange the edits by row or table, and hence may ship edits out of order - 
namely deletes and puts for the same row.

Since each WALEdit may contain many cells, and each cell can in theory be for a 
different row, we would have disentangle the Cells from the WALEdit and write 
them new edits, while retaining the chain clusterIds, table name from the log 
key.

In the case of region server failures at the source, edits can already arrive 
at the sink out of order, but with this we'd make that the norm rather then an 
exception under failure. HBase can delay removal of delete markers already in 
order to avoid most races.

We can more easily group by table, since that can be done by just looking at 
the HLogKey, but that will be far less efficient... In fact in Abhishek's test 
above will make no difference at all, since we're testing against a single 
table only - and I expect that will be common.

Grouping by region (which is also in HLogKey) is not safe, since rows can move 
between regions (due to splits). Or is it? Since a split is preceded by a flush.

> [Replication]Parallel apply edits on row-level
> ----------------------------------------------
>
>                 Key: HBASE-12988
>                 URL: https://issues.apache.org/jira/browse/HBASE-12988
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: hongyu bi
>            Assignee: Lars Hofhansl
>         Attachments: ParallelReplication-v2.txt
>
>
> we can apply  edits to slave cluster in parallel on table-level to speed up 
> replication .
> update : per conversation blow , it's better to apply edits on row-level in 
> parallel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits on row-level

Reply via email to