Thanks Jan, Comments inline
On Sunday, June 19, 2016 at 7:55:05 AM UTC-4, Jan Doberstein wrote: > > Hej, > > what happens if you reboot the Server? What happens if you restart the > Service? > Same behavior, all services start but graylog server behaves exactly as described above for both reboot and service restart. > > What happens if you kill the curl and try to restart graylag-server? > Aha, Thanks for pointing that out Graylog server starts The entire steps below. It looks like Graylog-Server is trying to a local mongo db for 10 minutes before timing out. Why is that, could that be a bug? This is the setting in graylog.conf # MongoDB Configuration mongodb_uri = mongodb://10.20.1.229:27017/graylog Why is it trying localhost? This instance of Graylog-Server is a slave / secondary server that connects to the master's mongo db. If this is a bug in Graylog, kindly re-open this <https://github.com/Graylog2/graylog2-server/issues/2370> ticket. Otherwise please let me know what I should do to avoid this 10 minute test to localhost. Thanks again. Full steps ubuntu@graylog-server2:~$ *sudo graylog-ctl stop* ok: down: elasticsearch: 0s, normally up ok: down: etcd: 0s, normally up ok: down: graylog-server: 0s, normally up ok: down: nginx: 1s, normally up ubuntu@graylog-server2:~$ *sudo graylog-ctl status* down: elasticsearch: 13s, normally up; run: log: (pid 1023) 316859s down: etcd: 13s, normally up; run: log: (pid 1013) 316859s down: graylog-server: 8s, normally up; run: log: (pid 1010) 316859s down: nginx: 8s, normally up; run: log: (pid 1015) 316859s ubuntu@graylog-server2:~$ *sudo graylog-ctl start* ok: run: elasticsearch: (pid 12883) 0s ok: run: etcd: (pid 12907) 0s ok: run: graylog-server: (pid *12919*) 1s ok: run: nginx: (pid 12925) 0s ubuntu@graylog-server2:~$ *ps -elf | grep 12919* 0 S root 12919 1004 0 80 0 - 1110 - 14:12 ? 00:00:00 / bin/sh ./run 0 S root 12920 12919 0 80 0 - 2154 - 14:12 ? 00:00:00 timeout *600 *bash -c until curl -s *http://127.0.0.1:27017*; do sleep 1; done 0 S ubuntu 12963 1582 0 80 0 - 2615 pipe_w 14:12 pts/0 00:00:00 grep --color=auto 12919 ubuntu@graylog-server2:~$ *kill 12920* -bash: kill: (12920) - Operation not permitted ubuntu@graylog-server2:~$ *sudo !!* sudo kill 12920 ubuntu@graylog-server2:~$ *sudo graylog-ctl status* run: elasticsearch: (pid 12883) 34s; run: log: (pid 1023) 316937s run: etcd: (pid 12907) 34s; run: log: (pid 1013) 316937s run: graylog-server: (pid *12919*) 34s; run: log: (pid 1010) 316937s run: nginx: (pid 12925) 33s; run: log: (pid 1015) 316937s ubuntu@graylog-server2:~$ *ps -elf | grep 12919* 4 S graylog 12919 1004 46 80 0 - 1109405 - 14:12 ? 00:00:18 / opt/graylog/embedded/jre/bin/java -Xms1g -Xmx1500m -XX:NewRatio=1 -server - XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+ CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar -Dlog4j.configurationFile=file:///opt/graylog/conf/log4j2.xml -Djava.library.path=/opt/graylog/server/lib/sigar/ -Dgraylog2.installation_source=unknown /opt/graylog/server/graylog.jar server -f /opt/graylog/conf/graylog.conf 0 S ubuntu 13133 1582 0 80 0 - 2615 pipe_w 14:13 pts/0 00:00:00 grep --color=auto 12919 > > with kind regards > Jan > > -- > | ----------------------------------------------------------------- > | get trusted and secure VPN services http://jalogis.ch/vpnsh > > On 17. Juni 2016 at 19:15:20, 123Dev ([email protected] <javascript:>) > wrote: > > > > > > We've upgraded our production system (AWS images) from 1.3.x to 2.0.2 > > On the primary server the Graylog Server is fully operational > > Whereas on the secondary server, the process is running (or it seems), > but > > it's not writing anything to the logs and it does not appear in the UI > as a > > node. > > > > > > On the trouble server > > sudo graylog-ctl status shows > > > > > > run: elasticsearch: (pid 1036) 480s; run: log: (pid 1032) 480s > > run: etcd: (pid 1033) 480s; run: log: (pid 1028) 480s > > run: *graylog-server: (pid 1029)* 480s; run: log: (pid 1024) 480s > > run: nginx: (pid 1025) 480s; run: log: (pid 1022) 480s > > > > > > > > As seen graylog-server is running with pid 1029 > > > > But if we check the processes with pid 1029 > > > > > > ps -elf | grep 1029 shows > > > > > > 0 S root 1029 1018 0 80 0 - 1110 - 21:26 ? 00:00:00 /bin/sh ./run > > 0 S root 1039 1029 0 80 0 - 2154 - 21:26 ? 00:00:00 timeout 600 bash -c > until curl -s http://127.0.0.1:27017; > > do sleep 1; done > > 0 S ubuntu 2638 2524 0 80 0 - 2616 pipe_w 21:35 pts/0 00:00:00 grep > --color=auto 1029 > > > > > > > > > > Which clearly is *not *the graylog-server process > > > > > > If we check the same thing on the primary server where everything is > > working fine, > > sudo graylog-ctl status shows > > > > > > run: elasticsearch: (pid 12071) 1318s; run: log: (pid 1037) 333246s > > run: etcd: (pid 12090) 1317s; run: log: (pid 1035) 333246s > > run: *graylog-server: (pid 12125)* 1312s; run: log: (pid 1038) 333246s > > run: mongodb: (pid 12132) 1311s; run: log: (pid 1036) 333246s > > run: nginx: (pid 12134) 1311s; run: log: (pid 1039) 333246s > > > > > > > > ps -elf | grep 12125 shows > > > > > > 4 S graylog 12125 1031 28 80 0 - 1169685 - 21:13 ? 00:06:14 > /opt/graylog/embedded/jre/bin/java > > -Xms1g -Xmx1500m -XX:NewRatio=1 -server -XX:+ResizeTLAB > -XX:+UseConcMarkSweepGC > > -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled > -XX:+UseParNewGC > > -XX:-OmitStackTraceInFastThrow -jar > -Dlog4j.configurationFile=file:///opt/graylog/conf/log4j2.xml > > -Djava.library.path=/opt/graylog/server/lib/sigar/ > -Dgraylog2.installation_source=unknown > > /opt/graylog/server/graylog.jar server -f /opt/graylog/conf/graylog.conf > > 0 S ubuntu 17847 1419 0 80 0 - 2615 pipe_w 21:35 pts/1 00:00:00 grep > --color=auto 12125 > > > > > > > > > > > > Clearly the graylog-server is running. > > > > So my questions are: > > > > - Why graylog-ctl thinks that graylog-server is running > > - Why graylog-server is not running? > > - How can we narrow down the root issue? with graylog-server not > > running, there the log files are not updated, hence no clue what is > going > > on. > > - Are there higher level logs for the graylog-ctl that would inform us > > what it is going wrong when it is trying to start the graylog-server > > > > > > PS: We noticed that after a long while, the graylog server eventually > shows > > up as a node on the UI, and the logs start filling > > > > Looking for errors in the logs, we only noticed the following warning > > > > > > 2016-06-17_17:04:56.90879 2016-06-17 17:04:56,908 WARN : > > org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled > > event of type from event bus > > > > > > > > We're not even certain it had any relevance to the problem of > > graylog-server not starting immediately. > > > > > > Thanks guidance on how to narrow this down is greatly appreciated. > > > > Thanks > > > > > > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Graylog Users" > > group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/graylog2/530ffc00-1742-4eea-994a-d5e95c165e88%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/ebf32bfc-d99e-42ca-a43e-2dd9c32b7570%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
