[
https://issues.apache.org/jira/browse/ACCUMULO-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Vines resolved ACCUMULO-1374.
----------------------------------
Resolution: Invalid
Assignee: (was: Eric Newton)
PEBCAK
{code}
grep -i kill /var/log/syslog | tail
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.570901] Out of memory: Kill
process 2318 (java) score 480 or sacrifice child
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.570931] Killed process 2318
(java) total-vm:5369512kB, anon-rss:3655040kB, file-rss:0kB
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.676155] java invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.676196] [<ffffffff81119745>]
oom_kill_process+0x85/0xb0
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.698754] Out of memory: Kill
process 1342 (java) score 169 or sacrifice child
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.698776] Killed process 1342
(java) total-vm:3176364kB, anon-rss:1287772kB, file-rss:0kB
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.735364] java invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.735403] [<ffffffff81119745>]
oom_kill_process+0x85/0xb0
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.758067] Out of memory: Kill
process 1512 (java) score 60 or sacrifice child
May 3 22:01:59 ip-10-10-1-122 kernel: [1749277.758093] Killed process 1512
(java) total-vm:2531416kB, anon-rss:461072kB, file-rss:0kB
{code}
> Sudden Death of master, gc, and tservers
> ----------------------------------------
>
> Key: ACCUMULO-1374
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1374
> Project: Accumulo
> Issue Type: Bug
> Components: gc, master, tserver
> Environment: 1.5, svn#1470047 & 1477382 - both in standalone instance
> on ec2 on ubuntu and small cluster on bare metal CentOs
> Reporter: John Vines
> Priority: Blocker
> Fix For: 1.5.0
>
>
> I wish I could provide more information. This has happened once on a bare
> metal centos cluster while running vanilla continuous ingest of svn#1470047.
> There was nothing reported in the logs when one of the tservers just died
> after the system had been up for ~1 day. The out and err files were sparse,
> and the master only reported that it had lost connection with the tserver at
> the point when the tserver just stopped logging (it was overnight, so this
> was not witnessed until morning).
> It recently happened again on a standalone instance on ec2 running ubuntu and
> svn#1477382. The instance had been running for ~7 hours. This time the gc,
> master, and tserver died. The gc died first, and then 2m:48s later the master
> died. 200ms later the tserver died. Again, there was no output in any of the
> out or err files for the processes. The logs also have no errors or warnings
> in them, just abrupt stops. The processes came up fine once restarted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira