[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

Sergey Shelukhin (JIRA) Fri, 14 Dec 2018 13:17:33 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721792#comment-16721792
 ]


Sergey Shelukhin commented on HBASE-21564:
------------------------------------------

I wonder if it's some replication issue uncovered by this patch? WAL operations 
seem to be the same between the successful and failed runs, however for some 
reason replication for one file at the same offset produces different number of 
edits (in the same test run):
{noformat}
Normal source for cluster 2: Total replicated edits: 1200, current progress: 
walGroup [localhost%2C45311%2C1544660121842]: currently replicating from: 
hdfs://localhost:42830/user/root/test-data/3c860f22-8402-6603-26e3-a2846cba30ef/WALs/localhost,45311,1544660121842/localhost%2C45311%2C1544660121842.1544660165345
 at position: 88061
Normal source for cluster testInterClusterReplication: Total replicated edits: 
946, current progress: 
walGroup [localhost%2C45311%2C1544660121842]: currently replicating from: 
hdfs://localhost:42830/user/root/test-data/3c860f22-8402-6603-26e3-a2846cba30ef/WALs/localhost,45311,1544660121842/localhost%2C45311%2C1544660121842.1544660165345
 at position: 88061
{noformat}

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-21564
>                 URL: https://issues.apache.org/jira/browse/HBASE-21564
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HBASE-21564.master.001.patch, 
> HBASE-21564.master.002.patch, HBASE-21564.master.003.patch, 
> HBASE-21564.master.004.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

Reply via email to