[ 
https://issues.apache.org/jira/browse/HBASE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17358007#comment-17358007
 ] 

Mallikarjun commented on HBASE-25891:
-------------------------------------

I have updated the description above. Hopefully it can answer your questions 
[~anoop.hbase]. Adding details specific to questions here.
{quote}Means the WAL files will get renamed with this prefix? When those files 
become eligible for deletion then?
{quote}
No. They are cleaned up by cleanup chore. Similar to `TimeToLiveLogCleaner` 

 
{quote}Now that we dont have this systen table at all, what happens when taking 
a full/incremental snapshot? 
{quote}
Full backup does snapshot and export. There is no dependence on WAL files. 

Incremental backup continues to check `rslogts:` to see which regionserver was 
backed up until what timestamp and based on which WAL files are generated to be 
backed up.  
{quote}How WAL files been retained when backup refers to it? When that become 
eligible for deletion? (Backup deleted/ another full backup came?) And how we 
make sure we allow WAL deletion then?
{quote}
We don't need to store list of WAL files for that. We have checkpoints until 
what point WAL's are read for backup and all those WAL files created beyond 
that timestamp are eligable for backup automatically. and those created before 
that timestamp can be cleaned up. 

 

> Remove dependence storing WAL filenames for backup
> --------------------------------------------------
>
>                 Key: HBASE-25891
>                 URL: https://issues.apache.org/jira/browse/HBASE-25891
>             Project: HBase
>          Issue Type: Improvement
>          Components: backup&restore
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Mallikarjun
>            Assignee: Mallikarjun
>            Priority: Major
>             Fix For: 3.0.0-alpha-1
>
>
> Context:
> Currently WAL logs are stored in `backup:system` meta table 
> {code:java}
> // code placeholder
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621996160175
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621996160175 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:backupId, 
> timestamp=1622003479895, value=backup_1622003358258 
> wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:file, 
> timestamp=1622003479895, 
> value=hdfs://store/hbase/oldWALs/preprod-dn-1%2C16020%2C1614844389000.1621999760280
>  wals:preprod-dn-1%2C16020%2C1614844389000.1621999760280 column=meta:root, 
> timestamp=1622003479895, value=s3a://2021-05-25--21-45-00--full/set1 
> {code}
> Also, Every backup (Incremental and Full) performs a log roll just before 
> taking backup and stores what was the timestamp at which log roll was 
> performed per regionserver per backup using following format. 
>  
> {code:java}
> // code placeholder
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-2:16020 
> column=meta:rs-log-ts, timestamp=1622887363301,value=\x00\x00\x01y\xDB\x81ar
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-3:16020 
> column=meta:rs-log-ts, timestamp=1622887363294, value=\x00\x00\x01y\xDB\x81aP
> rslogts:hdfs://xx.xx.xx.xx:8020/tmp/backup_yaktest\x00preprod-dn-1:16020 
> column=meta:rs-log-ts, timestamp=1622887363275, 
> value=\x00\x00\x01y\xDB\x81\x85
> {code}
>  
>  
> There are 2 cases for which WAL log refrences stored in `backup:system` and 
> are being used. 
> 1. To cleanup WAL's for which backup is already taken using 
> `BackupLogCleaner` 
> Since log roll timestamp is stored as part of backup per regionserver. We can 
> check all previous successfull backup's and then identify which logs are to 
> be retained and which ones are to be cleaned up as follows
>  * Identify which are the latest successful backups performed per table.
>  * Per backup identified above, identify what is the oldest log rolled 
> timestamp perfomed per regionserver per table. 
>  * All those WAL's which are older than oldest log rolled timestamp perfomed 
> for any table backed can be removed by `BackupLogCleaner` 
>  
> 2. During incremental backup, to check system table if there are any 
> duplicate WAL's for which backup is taken again. 
>  * Incremental backup already identifies which all WAL's to be backed up 
> using `rslogts:` mentioned above.
>  * Additionally it checks `wals:` to ensure no logs are backuped for second 
> time. And this is redundant and not seen any extra benefit. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to