[jira] [Comment Edited] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

Adam J Shook (JIRA) Wed, 06 Dec 2017 12:43:23 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280853#comment-16280853
 ]


Adam J Shook edited comment on ACCUMULO-4751 at 12/6/17 8:42 PM:
-----------------------------------------------------------------

I have attached some logs tracking a particular WAL file.  You can see that it 
has a {{createdTime}} but at some point a deleting entry must be written (note 
the timestamp change but the {{createdTime}} is gone) and then other entries 
added.

And some other interesting messages back-to-back:
{code}
2017-12-06 19:55:37,712 [replication.StatusCombiner] TRACE: Returned single 
value: 
~replhdfs://namenode:9000/accumulo/wal/tserver+31761/140223d6-30bd-41ae-a96d-d8af9884f85c
 stat:114 [] 14898338 false [begin: 0 end: 0 infiniteEnd: true closed: true 
createdTime: 1512589530002]
2017-12-06 19:55:37,712 [replication.StatusCombiner] TRACE: Returned single 
value: 
~replhdfs://namenode:9000/accumulo/wal/tserver+31761/140223d6-30bd-41ae-a96d-d8af9884f85c
 stat:12l [] 14898372 false [begin: 0 end: 0 infiniteEnd: true closed: false]
{code}

I need to check in the master logs, but seeing that this file replicates and 
then these entries disappear, I am assuming that the replication finishes, the 
Master deletes the entries, but something else is doing their regular update 
routines and pushed an entry after the Master has deleted it.  Sounds like a 
race condition here.


was (Author: adamjshook):
I have attached some logs tracking a particular WAL file.  You can see that it 
has a {{createdTime}} but at some point a deleting entry must be written (note 
the timestamp change but the {{createdTime}} is gone) and then other entries 
added.

And some other interesting messages back-to-back:
{code}
2017-12-06 19:55:37,712 [replication.StatusCombiner] TRACE: Returned single 
value: 
~replhdfs://namenode:9000/accumulo/wal/tserver+31761/140223d6-30bd-41ae-a96d-d8af9884f85c
 stat:114 [] 14898338 false [begin: 0 end: 0 infiniteEnd: true closed: true 
createdTime: 1512589530002]
2017-12-06 19:55:37,712 [replication.StatusCombiner] TRACE: Returned single 
value: 
~replhdfs://namenode:9000/accumulo/wal/tserver+31761/140223d6-30bd-41ae-a96d-d8af9884f85c
 stat:12l [] 14898372 false [begin: 0 end: 0 infiniteEnd: true closed: false]
{code}

> Some WALs don't replicate due to lacking a createdTime entry
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-4751
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4751
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.7.3, 1.8.1
>            Reporter: Adam J Shook
>            Assignee: Adam J Shook
>         Attachments: repl_logs.txt
>
>
> From what I can tell, the below error is thrown when no data for a particular 
> table is written to a WAL, but the file is closed.  This would be because the 
> {{Status}} entry from the {{StatusUtil}} for {{fileClosed}} is pre-built and 
> therefore does not have a {{createdTime}}.  This prevents a WAL from being 
> replicated until a {{createdTime}} entry is added manually.
> From the Accumulo master:
> {code}
> Status record ([begin: 0 end: 0 infiniteEnd: true closed: true]) for 
> hdfs://namenode:9000/accumulo/wal/tserver.example.com+31732/f922df9c-3ffc-49ee-8d0c-261c7a05fea2
>  in table 7l was written to metadata table which lacked createdTime
> {code}
> There are two solutions I have in mind:
> 1. Update the {{StatusUtil}} such that every returned {{Status}} object sets 
> the {{createdTime}} to {{System.currentTimeMillis}} if not explicitly given.
> 2. Update the Accumulo Master to set the {{createdTime}} to the WAL's 
> modification time in HDFS if the WAL is closed but there is no 
> {{createdTime}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (ACCUMULO-4751) Some WALs don't replicate due to lacking a createdTime entry

Reply via email to