Re: [graylog2] Graylog stopped working

cypherbit Thu, 05 Jan 2017 04:11:28 -0800

Hello,

after deleting the notification for "*Elasticsearch cluster unhealthy (RED) 
(triggered 6 days ago)"* and rebooting the server I didn't get notified of 
this problem again.


I still see:

*Elasticsearch clusterThe possible Elasticsearch cluster states and more 
related information is available in the Graylog documentation.*
*Elasticsearch cluster is yellow. Shards: 4 active, 0 initializing, 0 
relocating, 4 unassigned, What does this mean?*

May I delete the disk journal now and how?

On Tuesday, January 3, 2017 at 8:57:27 AM UTC+1, [email protected] wrote:

> Jochen,
>
> thank you, I looked at the following logs:
>
> root@graylog:/var/log/graylog/elasticsearch# nano current
>   GNU nano 
> 2.2.6                                                                   
> File: current
>
> 2017-01-02_09:16:55.57535 [2017-01-02 10:16:55,574][INFO 
> ][node                     ] [Molecule Man] version[2.3.1], pid[924], 
> build[bd98092/2016-04-04T12:25:05Z]
> 2017-01-02_09:16:55.57604 [2017-01-02 10:16:55,576][INFO 
> ][node                     ] [Molecule Man] initializing ...
> 2017-01-02_09:16:56.80747 [2017-01-02 10:16:56,807][INFO 
> ][plugins                  ] [Molecule Man] modules [reindex, 
> lang-expression, lang-groovy], plugins [kopf], sites [kopf]
> 2017-01-02_09:16:56.84193 [2017-01-02 10:16:56,841][INFO 
> ][env                      ] [Molecule Man] using [1] data paths, mounts 
> [[/var/opt/graylog/data (/dev/sdb1)]], net usable_space [85.1gb], net 
> total_space [98.3gb], spins? [possib$
> 2017-01-02_09:16:56.84211 [2017-01-02 10:16:56,842][INFO 
> ][env                      ] [Molecule Man] heap size [1.7gb], compressed 
> ordinary object pointers [true]
> 2017-01-02_09:16:56.84234 [2017-01-02 10:16:56,842][WARN 
> ][env                      ] [Molecule Man] max file descriptors [64000] 
> for elasticsearch process likely too low, consider increasing to at least 
> [65536]
> 2017-01-02_09:17:02.18937 [2017-01-02 10:17:02,189][INFO 
> ][node                     ] [Molecule Man] initialized
> 2017-01-02_09:17:02.19168 [2017-01-02 10:17:02,191][INFO 
> ][node                     ] [Molecule Man] starting ...
> 2017-01-02_09:17:02.56976 [2017-01-02 10:17:02,569][INFO 
> ][transport                ] [Molecule Man] publish_address {
> 192.168.1.22:9300}, bound_addresses {192.168.1.22:9300}
> 2017-01-02_09:17:02.57613 [2017-01-02 10:17:02,576][INFO 
> ][discovery                ] [Molecule Man] graylog/62ruQcNHSOahWbBEe71egw
> 2017-01-02_09:17:12.66122 [2017-01-02 10:17:12,661][INFO 
> ][cluster.service          ] [Molecule Man] new_master {Molecule 
> Man}{62ruQcNHSOahWbBEe71egw}{192.168.1.22}{192.168.1.22:9300}, reason: 
> zen-disco-join(elected_as_master, [0] joins rec$
> 2017-01-02_09:17:12.73775 [2017-01-02 10:17:12,737][INFO 
> ][http                     ] [Molecule Man] publish_address {
> 192.168.1.22:9200}, bound_addresses {192.168.1.22:9200}
> 2017-01-02_09:17:12.73913 [2017-01-02 10:17:12,739][INFO 
> ][node                     ] [Molecule Man] started
> 2017-01-02_09:17:12.98417 [2017-01-02 10:17:12,984][INFO 
> ][gateway                  ] [Molecule Man] recovered [1] indices into 
> cluster_state
> 2017-01-02_09:17:15.92973 [2017-01-02 10:17:15,929][INFO 
> ][cluster.service          ] [Molecule Man] added 
> {{graylog-52498cb4-349d-494a-8c6b-692fd78e3c6c}{56bjekcxQl6kwDCKKmeGuw}{192.168.1.22}{192.168.1.22:9350}{client=true,
>  
> data=false, mas$
> 2017-01-02_09:17:17.20882 [2017-01-02 10:17:17,208][INFO 
> ][cluster.routing.allocation] [Molecule Man] Cluster health status changed 
> from [RED] to [YELLOW] (reason: [shards started [[graylog_0][0], 
> [graylog_0][2], [graylog_0][2], [graylo$
>
>
> root@graylog:/var/log/graylog/elasticsearch# nano graylog.log
> [2016-12-30 07:41:38,399][WARN ][index.translog           ] [Slick] 
> [graylog_0][0] failed to delete unreferenced translog files
> java.nio.file.NoSuchFileException: 
> /var/opt/graylog/data/elasticsearch/graylog/nodes/0/indices/graylog_0/0/translog
>         at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at 
> sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
>         at java.nio.file.Files.newDirectoryStream(Files.java:457)
>         at 
> org.elasticsearch.index.translog.Translog$OnCloseRunnable.handle(Translog.java:726)
>         at 
> org.elasticsearch.index.translog.Translog$OnCloseRunnable.handle(Translog.java:714)
>         at 
> org.elasticsearch.index.translog.ChannelReference.closeInternal(ChannelReference.java:67)
>         at 
> org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:64)
>         at 
> org.elasticsearch.index.translog.TranslogReader.close(TranslogReader.java:143)
>         at 
> org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:129)
>         at 
> org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:354)
>         at 
> org.elasticsearch.index.translog.Translog.<init>(Translog.java:179)
>         at 
> org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)
>         at 
> org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:151)
>         at 
> org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
>         at 
> org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1515)
>         at 
> org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1499)
>         at 
> org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:972)
>         at 
> org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:944)
>         at 
> org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
>         at 
> org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
>         at 
> org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>
> Could it be that the Notification for:
>
> *Elasticsearch cluster unhealthy (RED) (triggered 6 days ago)The 
> Elasticsearch cluster state is RED which means shards are unassigned. This 
> usually indicates a crashed and corrupt cluster and needs to be 
> investigated. Graylog will write into the local disk journal. Read how to 
> fix this in  the Elasticsearch setup documentation.*
>
> Is an old one and now resolved?
>
>
> Although I still get:
>
> *Elasticsearch clusterThe possible Elasticsearch cluster states and more 
> related information is available in the Graylog documentation.*
> *Elasticsearch cluster is yellow. Shards: 4 active, 0 initializing, 0 
> relocating, 4 unassigned, What does this mean?*
>
> As mentioned before, we don't mind loosing all the data, if the 
> configurations, dashboards, streams are all preserved. If this somehow 
> helps in resolving these issues.
>
>
>
>
> On Friday, December 30, 2016 at 11:29:18 AM UTC+1, Jochen Schalanda wrote:
>
>> Hi,
>>
>> you first have to fix the cluster health state of your Elasticsearch 
>> cluster before you should even think about deleting the Graylog disk 
>> journal.
>>
>> Check the Elasticsearch logs for corresponding hints: 
>> http://docs.graylog.org/en/2.1/pages/configuration/file_location.html#omnibus-package
>>
>> Cheers,
>> Jochen
>>
>> On Friday, 30 December 2016 08:01:20 UTC+1, [email protected] wrote:
>>>
>>> Thank you again, we're almost there:
>>>
>>> df -m
>>> Filesystem     1M-blocks  Used Available Use% Mounted on
>>> udev                1495     1      1495   1% /dev
>>> tmpfs                300     1       300   1% /run
>>> /dev/dm-0          15282  4902      9582  34% /
>>> none                   1     0         1   0% /sys/fs/cgroup
>>> none                   5     0         5   0% /run/lock
>>> none                1500     0      1500   0% /run/shm
>>> none                 100     0       100   0% /run/user
>>> /dev/sda1            236   121       103  55% /boot
>>> /dev/sdb1         100664  8181     87347   9% /var/opt/graylog/data
>>>
>>>
>>> As you predicted we're still getting errors:
>>>
>>> Elasticsearch cluster unhealthy (RED)
>>> The Elasticsearch cluster state is RED which means shards are 
>>> unassigned. This usually indicates a crashed and corrupt cluster and needs 
>>> to be investigated. Graylog will write into the local disk journal. Read 
>>> how to fix this in the Elasticsearch setup documentation. 
>>> <http://docs.graylog.org/en/2.1/pages/configuration/elasticsearch.html#cluster-status-explained>
>>>
>>> I looked at the above provided link, but don't know how to delete the 
>>> journal, any help with this last step would be appreciated.
>>>
>>>
>>> On Wednesday, December 28, 2016 at 4:59:35 PM UTC+1, Edmundo Alvarez 
>>> wrote:
>>>
>>>> This documentation page covers how to extend the disk space in the OVA: 
>>>> http://docs.graylog.org/en/2.1/pages/configuration/graylog_ctl.html#extend-disk-space
>>>>  
>>>>
>>>> Please note that Graylog's journal is sometimes corrupted when it ran 
>>>> out of disk space. In that case you may need to delete the journal folder. 
>>>>
>>>> Regards, 
>>>> Edmundo 
>>>>
>>>> > On 28 Dec 2016, at 16:04, [email protected] wrote: 
>>>> > 
>>>> > Thank you Edmundo. 
>>>> > 
>>>> > It appears we ran out of space. 
>>>> > 
>>>> > df -h 
>>>> > Filesystem      Size  Used Avail Use% Mounted on 
>>>> > udev            1.5G  4.0K  1.5G   1% /dev 
>>>> > tmpfs           300M  388K  300M   1% /run 
>>>> > /dev/dm-0        15G   15G     0 100% / 
>>>> > none            4.0K     0  4.0K   0% /sys/fs/cgroup 
>>>> > none            5.0M     0  5.0M   0% /run/lock 
>>>> > none            1.5G     0  1.5G   0% /run/shm 
>>>> > none            100M     0  100M   0% /run/user 
>>>> > /dev/sda1       236M  121M  103M  55% /boot 
>>>> > 
>>>> > We don't mind loosing all the history, we just want the server up and 
>>>> running. If the space available can be extended even better (keep in mind 
>>>> this is OVA). Any suggestions? 
>>>> > 
>>>> > On Wednesday, December 28, 2016 at 9:18:24 AM UTC+1, Edmundo Alvarez 
>>>> wrote: 
>>>> > Hello, 
>>>> > 
>>>> > I would start by looking into your logs in /var/log/graylog, 
>>>> specially those in the "server" folder, which may give you some errors to 
>>>> start debugging the issue. 
>>>> > 
>>>> > Hope that helps. 
>>>> > 
>>>> > Regards, 
>>>> > Edmundo 
>>>> > 
>>>> > > On 27 Dec 2016, at 20:55, [email protected] wrote: 
>>>> > > 
>>>> > > We've been using Graylog OVA 2.1 for a while now, but it stopped 
>>>> working all of the sudden. 
>>>> > > 
>>>> > > We're getting: 
>>>> > > 
>>>> > >  Server currently unavailable 
>>>> > > We are experiencing problems connecting to the Graylog server 
>>>> running on https://graylog:443/api. Please verify that the server is 
>>>> healthy and working correctly. 
>>>> > > You will be automatically redirected to the previous page once we 
>>>> can connect to the server. 
>>>> > > Do you need a hand? We can help you. 
>>>> > > Less details 
>>>> > > This is the last response we received from the server: 
>>>> > > Error message 
>>>> > > cannot GET https://graylog:443/api/system/cluster/node (500) 
>>>> > > 
>>>> > > 
>>>> > > ubuntu@graylog:~$ sudo graylog-ctl status 
>>>> > > run: elasticsearch: (pid 32780) 74s; run: log: (pid 951) 10764s 
>>>> > > down: etcd: 0s, normally up, want up; run: log: (pid 934) 10764s 
>>>> > > run: graylog-server: (pid 33146) 35s; run: log: (pid 916) 10764s 
>>>> > > down: mongodb: 0s, normally up, want up; run: log: (pid 924) 10764s 
>>>> > > run: nginx: (pid 32974) 57s; run: log: (pid 914) 10764s 
>>>> > > 
>>>> > > 
>>>> > > How can we begin to troubleshoot the issue, which logs to view...? 
>>>> > > 
>>>> > > -- 
>>>> > > You received this message because you are subscribed to the Google 
>>>> Groups "Graylog Users" group. 
>>>> > > To unsubscribe from this group and stop receiving emails from it, 
>>>> send an email to [email protected]. 
>>>> > > To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/graylog2/4fb8da46-2e73-42c7-b67d-444c0b801484%40googlegroups.com.
>>>>  
>>>>
>>>> > > For more options, visit https://groups.google.com/d/optout. 
>>>> > 
>>>> > 
>>>> > -- 
>>>> > You received this message because you are subscribed to the Google 
>>>> Groups "Graylog Users" group. 
>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>> send an email to [email protected]. 
>>>> > To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/graylog2/9d79cf3a-b221-4419-b94f-f278ec598fe0%40googlegroups.com.
>>>>  
>>>>
>>>> > For more options, visit https://groups.google.com/d/optout. 
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/bc347749-f60d-4e88-b6b9-83b559d4b6ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Graylog stopped working

Reply via email to