I'd say it was the java swap that caused it, as ES will not start another process if it can see one running;
> markw@es00-fv:~$ ps -ef|grep java > 106 20801 1 5 Feb25 ? 1-14:27:46 /usr/bin/java -Xms4g > -Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError > -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid > -Des.path.home=/usr/share/elasticsearch -cp > :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* > -Des.default.config=/etc/elasticsearch/elasticsearch.yml > -Des.default.path.home=/usr/share/elasticsearch > -Des.default.path.logs=/var/log/elasticsearch > -Des.default.path.data=/var/lib/elasticsearch > -Des.default.path.work=/tmp/elasticsearch > -Des.default.path.conf=/etc/elasticsearch > org.elasticsearch.bootstrap.Elasticsearch > markw 24590 24487 0 08:18 pts/0 00:00:00 grep java > markw@es00-fv:~$ sservice elasticsearch status > [sudo] password for markw: > * elasticsearch is running > markw@es00-fv:~$ sservice elasticsearch start > * Starting Elasticsearch Server > > > * Already running. > > > [ OK ] > markw@es00-fv:~$ ps -ef|grep java > 106 20801 1 5 Feb25 ? 1-14:27:48 /usr/bin/java -Xms4g > -Xmx4g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError > -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid > -Des.path.home=/usr/share/elasticsearch -cp > :/usr/share/elasticsearch/lib/elasticsearch-1.0.0.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* > -Des.default.config=/etc/elasticsearch/elasticsearch.yml > -Des.default.path.home=/usr/share/elasticsearch > -Des.default.path.logs=/var/log/elasticsearch > -Des.default.path.data=/var/lib/elasticsearch > -Des.default.path.work=/tmp/elasticsearch > -Des.default.path.conf=/etc/elasticsearch > org.elasticsearch.bootstrap.Elasticsearch > markw 24626 24487 0 08:18 pts/0 00:00:00 grep java > markw@es00-fv:~$ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: [email protected] web: www.campaignmonitor.com On 26 March 2014 03:51, Matt Wise <[email protected]> wrote: > Last night we ran into an interesting issue. We pushed out a change to our > hosts via Puppet that installed Oracles Java7 as the default JRE/JDK on all > of our hosts -- previously it had been the default only on a small subset > of our systems. When this happened, our ElasticSearch hosts broke in a > fairly spectacular way. The basic problem seems to be that changing out the > Java binary caused the /etc/init.d/elasticsearch init script to believe the > app was not running (though it was), and therefore Puppet started it up. It > looked like this: > > puppet-agent[7069]: >> (/Stage[main]/Java::Jdk/Exec[set-licence-selected]/returns) executed >> successfullypuppet-agent[7069]: >> (/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/Apt::Key[Add key: EEA14886 >> from Apt::Source >> oracle_java]/Exec[164487e6b8d5245829c02e964fe69ec79110cb81]/returns) >> executed successfully >> puppet-agent[7069]: >> (/Stage[main]/Java::Jdk/Apt::Source[oracle_java]/File[oracle_java.list]/ensure) >> createdpuppet-agent[7069]: >> (/Stage[main]/Flume/Apt::Source[cdh4]/Apt::Key[Add key: 02A818DD from >> Apt::Source cdh4]/Exec[a8c3d5690bde3d926f373000d0a4b28ac782829e]/returns) >> executed successfully >> puppet-agent[7069]: >> (/Stage[main]/Flume/Apt::Source[cdh4]/File[cdh4.list]/ensure) created >> puppet-agent[7069]: (/Stage[main]/Apt::Update/Exec[apt_update]) Triggered >> 'refresh' from 2 events >> puppet-agent[7069]: >> (/Stage[main]/Java::Jdk/Package[oracle-java7-installer]/ensure) ensure >> changed 'purged' to 'present' >> puppet-agent[7069]: >> (/Stage[main]/Java::Jdk/Package[oracle-java7-set-default]/ensure) ensure >> changed 'purged' to 'present' >> *puppet-agent[7069]: >> (/Stage[main]/Elasticsearch::Service/Service[elasticsearch]/ensure) ensure >> changed 'stopped' to 'running'* > > > I want to stress, ElasticSearch was *already running* ... but the Java > change seems to have tripped up the init script so that its 'status' > command returned a >0 exit code, causing Puppet to think it needed to start > up ElasticSearch. When this happened, we ended up running two ES daemons on > each of our nodes, and a whole ton of "reshuffling" occurred. > > Is this a design feature? Bug? Thoughts? > > Matt Wise > Sr. Systems Architect > Nextdoor.com > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAOHkZxO-0%3DA0QDSHmek18GauQsjNqLCnju1FHfhN8NW1MLfNqQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZLzZa9H8_CbXE8brFmcy3wyy0WF%3DMann0QUy2okqcL1Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
