[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732531#comment-15732531
 ] 

Ted Yu commented on HBASE-14417:
--------------------------------

More response to Vlad's review comment w.r.t. fault tolerance in bulk load.

When bulk load fails midway, the user should provide complete set of hfiles 
again because the staging directory is not exposed to end users.
With this in mind, the benefit of using another hook (prior to 
postBulkLoadHFile()) to persist location of bulk loaded hfiles is minimal - 
since in subsequent bulk load attempt(s), the same set of (source) hfiles would 
be loaded again.

Another factor is that the more writes to hbase:backup table, the higher the 
chance of getting (write) failure.

One optimization we can do in the future is to combine writes (performed in 
postBulkLoadHFile()) from several regions on the same region server, provided 
that these writes are sufficiently close (300 ms apart, e.g.). The completion 
of bulk load on a single region server is determined by the slowest 
participating region, so this optimization would keep the response time on par 
with the current implementation (where hbase:backup table is not involed).

> Incremental backup and bulk loading
> -----------------------------------
>
>                 Key: HBASE-14417
>                 URL: https://issues.apache.org/jira/browse/HBASE-14417
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Ted Yu
>            Priority: Critical
>              Labels: backup
>             Fix For: 2.0.0
>
>         Attachments: 14417-tbl-ext.v10.txt, 14417-tbl-ext.v9.txt, 
> 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 14417.v2.txt, 14417.v21.txt, 
> 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to