Keith Turner created ACCUMULO-2768:
--------------------------------------
Summary: Agitator not restarting all datanodes
Key: ACCUMULO-2768
URL: https://issues.apache.org/jira/browse/ACCUMULO-2768
Project: Accumulo
Issue Type: Bug
Components: test
Affects Versions: 1.6.0
Environment: 1.6.0 RC5, hadoop 2.2.0, ZK 3.4.5
20 node EC2 cluster
Reporter: Keith Turner
Fix For: 1.6.1
I ran a 24 hours CI test against 1.6.0 RC5 w/ agitation.
I modified the agitation settings to the following :
{noformat}
#time amount of time (in minutes) the agitator should sleep before killing
KILL_SLEEP_TIME=3
#time amount of time (in minutes) the agitator should sleep after killing
before running tup
TUP_SLEEP_TIME=1
#the minimum and maximum server the agitator will kill at once
MIN_KILL=1
MAX_KILL=2
{noformat}
I started 3 walkers all of which died. The walkers saw
{{org.apache.accumulo.core.client.impl.AccumuloServerException}}. On the
tserver the cause was {{org.apache.hadoop.hdfs.BlockMissingException}}.
After stopping agitation scripts, I ran {{start-dfs.sh}} and saw it started 5
datanodes. Looking at {{datanode-agitator.pl}} I think the problem is when it
kills two datanodes, it only restarts one.
All of my ingest clients survived and were able to write 8 billion entries in
this wacky environment. I noticed on the monitor that there were long periods
of no ingest, but it was not a complete flat line.
--
This message was sent by Atlassian JIRA
(v6.2#6252)