[
https://issues.apache.org/jira/browse/HBASE-26323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-26323:
------------------------------
Priority: Major (was: Minor)
> Introduce a SnapshotProcedure
> -----------------------------
>
> Key: HBASE-26323
> URL: https://issues.apache.org/jira/browse/HBASE-26323
> Project: HBase
> Issue Type: New Feature
> Components: proc-v2, snapshots
> Reporter: ruanhui
> Assignee: ruanhui
> Priority: Major
>
> Currently,snapshot in hbase uses zk as coordinator. It has some limitations,
> a. Snapshot maybe fails when there are region server crashes.
> b. Snapshot maybe failed when master restarts.
> c. Only one snapshot per table can be taken in a time.
> d. Snapshot verify will be handled by master, which may take long time when
> our table has a large number of regions, for example 10000.
>
> Since we have procedure v2 framework now, it is possible to solve the above
> problems. So here is a procedure2-based snapshot implementation. It has some
> goals,
> a. Snapshot can continue when there are region server crashes.
> b. Snapshot can continue when master restarts.
> c. More than one snapshot per table can be taken in a time.
> d. We can use region servers to verify snapshot to accelerate procedure.
>
> Here are some details about implementation.
> *SnapshotProcedure*
> SnapshotProcedure is used to take snapshot on a table. It acquires shared
> table lock on the snapshot table and hold the shared lock during suspend and
> yield.
> *SnapshotRegionProcedure*
> SnapshotRegionProcedure is used to take snapshot on a specific region of the
> snapshot table. It acquires exclusive region lock and releases lock during
> suspend and yield. Before dispatch remote snapshot operations to region
> server, it will check target region in RIT or not. If target region is in
> RIT, it will sleep some time and retry.
> *SnapshotVerifyProcedure*
> SnapshotVerifyProcedure is used to send snapshot verify request to region
> server. If snapshot is corrupted, it will notify parent snapshot to retry.
> When remote region server is crashed, it will choose another online server
> and retry.
>
> I would be very grateful for any advice and guidance. Is anyone interested in
> taking a look?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)