[ 
https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630364#comment-15630364
 ] 

Josh Elser commented on ACCUMULO-4506:
--------------------------------------

You'll want to figure out why each of those two files have replication work to 
do but is not happening:

{noformat}
hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
 []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
1477052816238]
hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
 []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
1477314369633]
{noformat}

The files cannot be removed because replication to this peer still needs to 
happen (and thus would be dataloss on that peer if it doesn't happen). The 
"large" begin value is essentially signifying that replication on that file is 
done for the peer.

Look for logs in the Master from DistributedWorkQueueWorkAssigner or 
UnorderedWorkAssigner for this file to that peer. You should see some reason as 
to why the Master isn't assigning this work (or some information as to the 
TabletServer that is supposed to be performing replication). After that last 
work entry is like the others for that file, the Master should clean up all of 
these records which will let the Accumulo GC remove the file.

>  Some in-progress files for replication never replicate
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4506
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4506
>             Project: Accumulo
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 1.7.2
>            Reporter: Adam J Shook
>            Assignee: Josh Elser
>
> We're seeing an issue with replication where two files have been in-progress 
> for a long time and based on the logs are not going to be replicated.  The 
> metadata from the {{accumulo.replication}} table looks a little funky, with a 
> very large {{begin}} value.
> *Logs*
> {noformat}
> 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: 
> Not queueing work for 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> to Remote Name: peer_instance Remote identifier: 5h Source Table ID: k 
> because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365827] doesn't need replication
> 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: 
> Not queueing work for 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> to Remote Name: peer_instance Remote identifier: 5i Source Table ID: l 
> because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052816174] doesn't need replication
> {noformat}
> *Replication table*
> {noformat}
> scan -r 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> -t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:j []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:k []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:l []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477314365707]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
>  []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365707]
> scan -r 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> -t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:j []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:k []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:l []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477052816174]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
>  []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052816174]
> {noformat}
> *HDFS*
> {noformat}
> hdfs dfs -ls 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> -rwxr-xr-x   3 ubuntu supergroup 1117650900 2016-10-24 13:09 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
> -rwxr-xr-x   3 ubuntu supergroup 1171968390 2016-10-21 12:31 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to