[ 
https://issues.apache.org/jira/browse/HBASE-23205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim updated HBASE-23205:
---------------------------------
    Description: 
We observed a lot of old WALs were not removed from archives and their 
corresponding replication queues, while testing with 1.4.10.
 stacked old WALs are empty or have no entries to be replicated (not in 
replication table_cfs)
  
 As described in HBASE-22784, if no entries to be replicated are appended to 
WALs, log position will never be updated. As a consequence, all WALs won’t be 
removed. this issue happened since HBASE-15995.
  
 I think old WALs would not be stacked with HBASE-22784. but, it still have 
something to be fixed as below
 * log position could be updated wrongly, when log rolled, because lastWalPath 
of batches might not point to WAL currently being read.
 ** For example,  after last entries to be added in batch was read from P1 
position in the WAL W1
 and then WAL rolled, and reader read until it reaches the end of old wals and 
continue reading entries from new WAL W2, and then it reached batch size. 
current read position for W2 is P2. In this case, the batch being passed to a 
shipper have walPath W1 and P2, so shipper will try to update position P2 for 
W1. it may result in data inconsistency in recovery case or update failure to 
zookeeper (znode could not exist by previous log position updates, i guess this 
case is the same case as HBASE-23169 ?)

 * log position could be not updated or updated to wrong position by 
pendingShipment flag introduced from HBASE-22784
 ** In shipper thread, it would not beguaranteed to update log position always 
by setting pendingShipment to false.
 If  reader set the flag to true, right after shipper set it to false during 
{color:#24292e}updateLogPosition(), shipper won’t update log position.{color}
 On the other hand, while reader read filtered entries, If shipper set to false 
reader will update log position to current read position. it may lose data in 
recovery case.

 * a lot of log position updates could be happened, when most of WAL entries 
are filtered by TableCfWALEntryFilter.
 ** I think it would be better to reduce the number of log updates in that 
case, because
 ### zookeeper writes are more expensive operations than reads.(since writes 
involve synchronizing the state of all servers),
 ### even if read position was not updated, it would be harmless because all 
entries will be filtered out again in recovery process.
 I think it's enough to update log position only when wal rolled in that case. 
(to cleanup old wals)

 

In addition, During this work, i found a minor bug which is updating 
replication buffer size wrongly by decreasing total buffer size with the size 
of bulk loaded files.
 I’d like to fix it, if it’s ok.

  was:
We observed a lot of old WALs were not removed from archives and their 
corresponding replication queues, while testing with 1.4.10.
 stacked old WALs are empty or have no entries to be replicated (not in 
replication table_cfs)
  
 As described in HBASE-22784, if no entries to be replicated are appended to 
WALs, log position will never be updated. As a consequence, all WALs won’t be 
removed. this issue happened since HBASE-15995.
  
 I think old WALs would not be stacked with HBASE-22784. but, it still have 
something to be fixed as below
 * log position could be updated wrongly, when log rolled, because lastWalPath 
of batches might not point to WAL currently being read.
 ** For example,  after last entries to be added in batch was read from P1 
position in the WAL W1
 and then WAL rolled, and reader read until it reaches the end of old wals and 
continue reading entries from new WAL W2, and then it reached batch size. 
current read position for W2 is P2. In this case, the batch being passed to a 
shipper have walPath W1 and P2, so shipper will try to update position P2 for 
W1. it may result in data inconsistency in recovery case or update failure to 
zookeeper (znode could not exist by previous log position updates, i guess this 
case is the same case asHBASE-23169?)

 * log position could be not updated or updated to wrong position by 
pendingShipment flag introduced from HBASE-22784
 ** In shipper thread, it would not beguaranteed to update log position always 
by setting pendingShipment to false.
 If  reader set the flag to true, right after shipper set it to false during 
{color:#24292e}updateLogPosition(), shipper won’t update log position.{color}
 On the other hand, while reader read filtered entries, If shipper set to false 
reader will update log position to current read position. it may lose data in 
recovery case.

 * a lot of log position updates could be happened, when most of WAL entries 
are filtered by TableCfWALEntryFilter.
 ** I think it would be better to reduce the number of log updates in that 
case, because
 ### zookeeper writes are more expensive operations than reads.(since writes 
involve synchronizing the state of all servers),
 ### even if read position was not updated, it would be harmless because all 
entries will be filtered out again in recovery process.
 I think it's enough to update log position only when wal rolled in that case. 
(to cleanup old wals)

 

In addition, During this work, i found a minor bug which is updating 
replication buffer size wrongly by decreasing total buffer size with the size 
of bulk loaded files.
 I’d like to fix it, if it’s ok.


> Correctly update the position of WALs currently being replicated.
> -----------------------------------------------------------------
>
>                 Key: HBASE-23205
>                 URL: https://issues.apache.org/jira/browse/HBASE-23205
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.4.10, 1.4.11
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>            Priority: Major
>
> We observed a lot of old WALs were not removed from archives and their 
> corresponding replication queues, while testing with 1.4.10.
>  stacked old WALs are empty or have no entries to be replicated (not in 
> replication table_cfs)
>   
>  As described in HBASE-22784, if no entries to be replicated are appended to 
> WALs, log position will never be updated. As a consequence, all WALs won’t be 
> removed. this issue happened since HBASE-15995.
>   
>  I think old WALs would not be stacked with HBASE-22784. but, it still have 
> something to be fixed as below
>  * log position could be updated wrongly, when log rolled, because 
> lastWalPath of batches might not point to WAL currently being read.
>  ** For example,  after last entries to be added in batch was read from P1 
> position in the WAL W1
>  and then WAL rolled, and reader read until it reaches the end of old wals 
> and continue reading entries from new WAL W2, and then it reached batch size. 
> current read position for W2 is P2. In this case, the batch being passed to a 
> shipper have walPath W1 and P2, so shipper will try to update position P2 for 
> W1. it may result in data inconsistency in recovery case or update failure to 
> zookeeper (znode could not exist by previous log position updates, i guess 
> this case is the same case as HBASE-23169 ?)
>  * log position could be not updated or updated to wrong position by 
> pendingShipment flag introduced from HBASE-22784
>  ** In shipper thread, it would not beguaranteed to update log position 
> always by setting pendingShipment to false.
>  If  reader set the flag to true, right after shipper set it to false during 
> {color:#24292e}updateLogPosition(), shipper won’t update log position.{color}
>  On the other hand, while reader read filtered entries, If shipper set to 
> false reader will update log position to current read position. it may lose 
> data in recovery case.
>  * a lot of log position updates could be happened, when most of WAL entries 
> are filtered by TableCfWALEntryFilter.
>  ** I think it would be better to reduce the number of log updates in that 
> case, because
>  ### zookeeper writes are more expensive operations than reads.(since writes 
> involve synchronizing the state of all servers),
>  ### even if read position was not updated, it would be harmless because all 
> entries will be filtered out again in recovery process.
>  I think it's enough to update log position only when wal rolled in that 
> case. (to cleanup old wals)
>  
> In addition, During this work, i found a minor bug which is updating 
> replication buffer size wrongly by decreasing total buffer size with the size 
> of bulk loaded files.
>  I’d like to fix it, if it’s ok.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to