Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

John Mon, 27 Jun 2016 05:33:08 -0700

Hi
I checked the elasticsearch log and I don't see something special
The cluster status is green


this is the last log file

2016-06-26_09:51:28.78352 [2016-06-26 12:51:28,782][INFO ][node             
        ] [Glenn Talbot] version[2.3.1], pid[953], 
build[bd98092/2016-04-04T12:25:05Z]
2016-06-26_09:51:28.79783 [2016-06-26 12:51:28,794][INFO ][node             
        ] [Glenn Talbot] initializing ...
2016-06-26_09:51:30.17146 [2016-06-26 12:51:30,171][INFO ][plugins         
         ] [Glenn Talbot] modules [reindex, lang-expression, lang-groovy], 
plugins [kopf], sites [kopf]
2016-06-26_09:51:30.29289 [2016-06-26 12:51:30,292][INFO ][env             
         ] [Glenn Talbot] using [1] data paths, mounts [[/ 
(/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
[14.9gb], spins? [possibly], types [ext4]
2016-06-26_09:51:30.29564 [2016-06-26 12:51:30,294][INFO ][env             
         ] [Glenn Talbot] heap size [37.6gb], compressed ordinary object 
pointers [false]
2016-06-26_09:51:30.29766 [2016-06-26 12:51:30,294][WARN ][env             
         ] [Glenn Talbot] max file descriptors [64000] for elasticsearch 
process likely too low, consider increasing to at least [65536]
2016-06-26_09:51:34.69050 [2016-06-26 12:51:34,690][INFO ][node             
        ] [Glenn Talbot] initialized
2016-06-26_09:51:34.69107 [2016-06-26 12:51:34,690][INFO ][node             
        ] [Glenn Talbot] starting ...
2016-06-26_09:51:35.32863 [2016-06-26 12:51:35,328][INFO ][transport       
         ] [Glenn Talbot] publish_address {172.25.232.45:9300}, 
bound_addresses {172.25.232.45:9300}
2016-06-26_09:51:35.33658 [2016-06-26 12:51:35,336][INFO ][discovery       
         ] [Glenn Talbot] graylog-production/th7wM-a9ThaAY_umCV3v2w
2016-06-26_09:51:45.37933 [2016-06-26 12:51:45,379][INFO ][cluster.service 
         ] [Glenn Talbot] new_master {Glenn 
Talbot}{th7wM-a9ThaAY_umCV3v2w}{172.25.232.45}{172.25.232.45:9300}, added 
{{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
 
data=false, 
master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
joins received)
2016-06-26_09:51:45.40239 [2016-06-26 12:51:45,402][INFO ][http             
        ] [Glenn Talbot] publish_address {172.25.232.45:9200}, 
bound_addresses {172.25.232.45:9200}
2016-06-26_09:51:45.40350 [2016-06-26 12:51:45,403][INFO ][node             
        ] [Glenn Talbot] started
2016-06-26_09:51:45.53808 [2016-06-26 12:51:45,537][INFO ][gateway         
         ] [Glenn Talbot] recovered [1] indices into cluster_state
2016-06-26_09:51:45.87525 [2016-06-26 12:51:45,875][INFO 
][cluster.routing.allocation] [Glenn Talbot] Cluster health status changed 
from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
2016-06-26_09:57:01.91281 [2016-06-26 12:57:01,912][INFO ][cluster.service 
         ] [Glenn Talbot] added 
{{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(join from 
node[{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, master=false}])
2016-06-26_10:17:02.43148 [2016-06-26 13:17:02,428][INFO ][cluster.metadata 
        ] [Glenn Talbot] [graylog_0] update_mapping [message]
2016-06-26_15:35:13.25159 [2016-06-26 18:35:13,250][INFO ][node             
        ] [Glenn Talbot] stopping ...
2016-06-26_15:35:13.32027 [2016-06-26 18:35:13,319][INFO ][node             
        ] [Glenn Talbot] stopped
2016-06-26_15:35:13.32153 [2016-06-26 18:35:13,320][INFO ][node             
        ] [Glenn Talbot] closing ...
2016-06-26_15:35:13.33032 [2016-06-26 18:35:13,329][INFO ][node             
        ] [Glenn Talbot] closed
2016-06-26_15:46:49.97957 [2016-06-26 18:46:49,977][INFO ][node             
        ] [Tether] version[2.3.1], pid[1364], 
build[bd98092/2016-04-04T12:25:05Z]
2016-06-26_15:46:49.97959 [2016-06-26 18:46:49,978][INFO ][node             
        ] [Tether] initializing ...
2016-06-26_15:46:50.52052 [2016-06-26 18:46:50,519][INFO ][plugins         
         ] [Tether] modules [reindex, lang-expression, lang-groovy], 
plugins [kopf], sites [kopf]
2016-06-26_15:46:50.54693 [2016-06-26 18:46:50,546][INFO ][env             
         ] [Tether] using [1] data paths, mounts [[/ 
(/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
[14.9gb], spins? [possibly], types [ext4]
2016-06-26_15:46:50.54734 [2016-06-26 18:46:50,546][INFO ][env             
         ] [Tether] heap size [37.6gb], compressed ordinary object pointers 
[false]
2016-06-26_15:46:50.54871 [2016-06-26 18:46:50,547][WARN ][env             
         ] [Tether] max file descriptors [64000] for elasticsearch process 
likely too low, consider increasing to at least [65536]
2016-06-26_15:46:53.00370 [2016-06-26 18:46:53,003][INFO ][node             
        ] [Tether] initialized
2016-06-26_15:46:53.00560 [2016-06-26 18:46:53,003][INFO ][node             
        ] [Tether] starting ...
2016-06-26_15:46:54.29760 [2016-06-26 18:46:54,297][INFO ][transport       
         ] [Tether] publish_address {172.25.232.45:9300}, bound_addresses 
{172.25.232.45:9300}
2016-06-26_15:46:54.30807 [2016-06-26 18:46:54,307][INFO ][discovery       
         ] [Tether] graylog-production/kMz-P-dcQZCObsEMIXZtxQ
2016-06-26_15:47:04.35293 [2016-06-26 18:47:04,352][INFO ][cluster.service 
         ] [Tether] new_master 
{Tether}{kMz-P-dcQZCObsEMIXZtxQ}{172.25.232.45}{172.25.232.45:9300}, added 
{{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
 
data=false, 
master=false},{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
 
data=false, 
master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
 
data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
joins received)
2016-06-26_15:47:04.36292 [2016-06-26 
18:47:04,362][DEBUG][action.admin.cluster.state] [Tether] no known master 
node, scheduling a retry
2016-06-26_15:47:04.36567 [2016-06-26 
18:47:04,364][DEBUG][action.admin.cluster.health] [Tether] no known master 
node, scheduling a retry
2016-06-26_15:47:04.38578 [2016-06-26 18:47:04,385][INFO ][http             
        ] [Tether] publish_address {172.25.232.45:9200}, bound_addresses 
{172.25.232.45:9200}
2016-06-26_15:47:04.38617 [2016-06-26 18:47:04,385][INFO ][node             
        ] [Tether] started
2016-06-26_15:47:04.42204 [2016-06-26 18:47:04,421][INFO ][gateway         
         ] [Tether] recovered [1] indices into cluster_state
2016-06-26_15:47:04.76029 [2016-06-26 18:47:04,759][INFO 
][cluster.routing.allocation] [Tether] Cluster health status changed from 
[RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).



בתאריך יום שני, 27 ביוני 2016 בשעה 14:53:20 UTC+3, מאת Marius Sturm:
>
> Hi,
> this all boils down to an unstable Elasticsearch instance. When Graylog is 
> not able to forward log messages to ES it buffers them on disk and tries to 
> send them later. This is called journal.
> So when your ES service is not running properly the journal fills up with 
> messages. Please take a look into the ES logs to figure out why it has 
> problems with message ingestion. You can find them in 
> /var/log/graylog/elasticsearch/current
>
> Cheers,
> Marius
>
>
> On 27 June 2016 at 13:39, John <[email protected] <javascript:>> wrote:
>
>> 1 and 4
>> and the graylog server node is not sending data to elasticsearch
>> I deleted the journal but it doesn't help
>> the problems began few days after I upgraded from 1.3 to 2.0.2
>>
>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:30:28 UTC+3, מאת Joe K:
>>
>>> Which problem out of 4?
>>>
>>>
>>> On Monday, June 27, 2016 at 2:00:14 PM UTC+3, John wrote:
>>>>
>>>> Hi Joe
>>>> I have exactly the same problem few days after I upgraded from 1.3 to 
>>>> 2.0.2
>>>> Did you managed to fix this issue?
>>>>
>>>> בתאריך יום חמישי, 26 במאי 2016 בשעה 14:02:19 UTC+3, מאת Joe K:
>>>>>
>>>>>
>>>>> - We run it on t2.medium. (4GB RAM, 2 cores)
>>>>> - About 1 incoming message per second.
>>>>> - tried 2.0.0 and now running 2.0.1
>>>>>
>>>>> Anyone use Image in real world application? Graylog 2.0 image fails 
>>>>> after few days. Is this Image problem or Graylog in general?
>>>>>
>>>>> It runs fine for about a week. After that there's errors and search 
>>>>> stop working. Search requests timeout.
>>>>> There's many errors and they are very cryptic, google search does not 
>>>>> give any solutions how to manage them:
>>>>>
>>>>>
>>>>> *1. After about a week we have error "Uncommited messages deleted from 
>>>>> journal"*
>>>>>
>>>>>> Uncommited messages deleted from journal (triggered 9 days ago)
>>>>>> Some messages were deleted from the Graylog journal before they could 
>>>>>> be written to Elasticsearch. Please verify that your Elasticsearch 
>>>>>> cluster 
>>>>>> is healthy and fast enough. You may also want to review your Graylog 
>>>>>> journal settings and set a higher limit. (Node: f12...
>>>>>
>>>>>
>>>>> What to do about this? What is "journal"? Google search produce no 
>>>>> answers.
>>>>>
>>>>> *2. After about 4 days of clean install it always trigger "Cluster 
>>>>> unhealthy"*
>>>>>
>>>>>>  "Elasticsearch cluster unhealthy (RED)"
>>>>>> "The Elasticsearch cluster state is RED which means shards are 
>>>>>> unassigned. This usually indicates a crashed and corrupt cluster and 
>>>>>> needs 
>>>>>> to be investigated. Graylog will write into the local disk journal. Read 
>>>>>> how to fix this in the Elasticsearch setup documentation."
>>>>>
>>>>>
>>>>> When you go to that documentation link it says "The red status 
>>>>> indicates that some or all of the primary shards are not available. In 
>>>>> this 
>>>>> state, no searches can be performed until all primary shards are 
>>>>> restored."
>>>>> That's it. what are you supposed to do?
>>>>> After long search finally found one solution: this was cured once with 
>>>>> *curl 
>>>>> -XPUT 'localhost:9200/_settings' -d '{ "index" : {       
>>>>>  "number_of_replicas" : 0}}'*
>>>>> Next time it happened, we tried the solution again, but response was 
>>>>> *{"acknowledged":false}*
>>>>> So what now???
>>>>>
>>>>> *3. Every time we perform graylog-ctl restart four more unassigled 
>>>>> shards appear:*
>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing, 0
>>>>>  relocating, 8 unassigned
>>>>> graylog-ctl restart
>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing, 0
>>>>>  relocating, 12 unassigned
>>>>> Etc.
>>>>>
>>>>>
>>>>>
>>>>> *4. Journal utilization is too high without any hint on how to set it 
>>>>> to higher.*
>>>>>
>>>>>>  Journal utilization is too high (triggered 11 days ago)
>>>>>> Journal utilization is too high and may go over the limit soon. 
>>>>>> Please verify that your Elasticsearch cluster is healthy and fast 
>>>>>> enough. 
>>>>>> You may also want to review your Graylog journal settings and set a 
>>>>>> higher 
>>>>>> limit. (Node: f121
>>>>>
>>>>>
>>>>> What is this "journal"? and how to set it to "higher"?
>>>>>
>>>>> Please help!
>>>>>
>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Graylog Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Developer
>
> Tel.: +49 (0)40 609 452 077
> Fax.: +49 (0)40 609 452 078
>
> TORCH GmbH - A Graylog Company
> Poolstraße 21
> 20335 Hamburg
> Germany
>
> https://www.graylog.com <https://www.torch.sh/>
>
> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
> Geschäftsführer: Lennart Koopmann (CEO)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/8078499e-d96b-4086-ba8a-7863380c0405%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

Reply via email to