[ 
https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359302#comment-17359302
 ] 

Mallikarjun commented on HBASE-25891:
-------------------------------------

[~anoop.hbase] Did you get a chance to look at it?

> Remove dependence storing WAL filenames for backup
> --------------------------------------------------
>
>                 Key: HBASE-25891
>                 URL: https://issues.apache.org/jira/browse/HBASE-25891
>             Project: HBase
>          Issue Type: Improvement
>          Components: backup&restore
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Mallikarjun
>            Assignee: Mallikarjun
>            Priority: Major
>             Fix For: 3.0.0-alpha-1
>
>
> Context:
> Currently WAL logs are stored in `backup:system` meta table 
> {code:java}
> // code placeholder
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> {code}
> Also, Every backup (Incremental and Full) performs a log roll just before 
> taking backup and stores what was the timestamp at which log roll was 
> performed per regionserver per backup using following format. 
>  
> {code:java}
> // code placeholder
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 
> column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 
> column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 
> column=meta:rs-log-ts, timestamp=1622887363275, 
> value=\x00\x00\x01y\xDB\x81\x85
> {code}
>  
>  
> There are 2 cases for which WAL log refrences stored in `backup:system` and 
> are being used. 
> 1. To cleanup WAL's for which backup is already taken using 
> `BackupLogCleaner` 
> Since log roll timestamp is stored as part of backup per regionserver. We can 
> check all previous successfull backup's and then identify which logs are to 
> be retained and which ones are to be cleaned up as follows
>  * Identify which are the latest successful backups performed per table.
>  * Per backup identified above, identify what is the oldest log rolled 
> timestamp perfomed per regionserver per table. 
>  * All those WAL's which are older than oldest log rolled timestamp perfomed 
> for any table backed can be removed by `BackupLogCleaner` 
>  
> 2. During incremental backup, to check system table if there are any 
> duplicate WAL's for which backup is taken again. 
>  * Incremental backup already identifies which all WAL's to be backed up 
> using `rslogts:` mentioned above.
>  * Additionally it checks `wals:` to ensure no logs are backuped for second 
> time. And this is redundant and not seen any extra benefit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to