Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

John Tue, 28 Jun 2016 05:39:13 -0700

Hi,
I fixed this issue by creating a new input with the same parameters and 
only a different listening port
Now everything works well
 .And finally I deleted manually the 2 alerts "Uncommited messages ...." 
and "journal utilization too high..."  because it seams that these error 
messages don't disappear automatically 
So I don't understand why the first input doesn't work anymore. It receives 
the messages but it doesn't send them to elasticsearch


בתאריך יום שני, 27 ביוני 2016 בשעה 16:49:35 UTC+3, מאת John:
>
> screenshots from my UI
>
>
> <https://lh3.googleusercontent.com/-tEVtJ8knQ8c/V3Euz0SdodI/AAAAAAAAfws/_kr7G20PxvE3R4WOK0AIHDKRYDGqbMwrQCLcB/s1600/journal_too_high.png>
>
>
> <https://lh3.googleusercontent.com/-l6rI9eoCEb8/V3Eu2hMMM5I/AAAAAAAAfw0/iiZfY6RxadUhqGDTKr93GBqxR7wmqhN0wCLcB/s1600/node_details.png>
>
>
> בתאריך יום שני, 27 ביוני 2016 בשעה 15:32:52 UTC+3, מאת John:
>>
>> Hi
>> I checked the elasticsearch log and I don't see something special
>> The cluster status is green
>>
>> this is the last log file
>>
>> 2016-06-26_09:51:28.78352 [2016-06-26 12:51:28,782][INFO ][node           
>>           ] [Glenn Talbot] version[2.3.1], pid[953], 
>> build[bd98092/2016-04-04T12:25:05Z]
>> 2016-06-26_09:51:28.79783 [2016-06-26 12:51:28,794][INFO ][node           
>>           ] [Glenn Talbot] initializing ...
>> 2016-06-26_09:51:30.17146 [2016-06-26 12:51:30,171][INFO ][plugins       
>>            ] [Glenn Talbot] modules [reindex, lang-expression, 
>> lang-groovy], plugins [kopf], sites [kopf]
>> 2016-06-26_09:51:30.29289 [2016-06-26 12:51:30,292][INFO ][env           
>>            ] [Glenn Talbot] using [1] data paths, mounts [[/ 
>> (/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
>> [14.9gb], spins? [possibly], types [ext4]
>> 2016-06-26_09:51:30.29564 [2016-06-26 12:51:30,294][INFO ][env           
>>            ] [Glenn Talbot] heap size [37.6gb], compressed ordinary object 
>> pointers [false]
>> 2016-06-26_09:51:30.29766 [2016-06-26 12:51:30,294][WARN ][env           
>>            ] [Glenn Talbot] max file descriptors [64000] for elasticsearch 
>> process likely too low, consider increasing to at least [65536]
>> 2016-06-26_09:51:34.69050 [2016-06-26 12:51:34,690][INFO ][node           
>>           ] [Glenn Talbot] initialized
>> 2016-06-26_09:51:34.69107 [2016-06-26 12:51:34,690][INFO ][node           
>>           ] [Glenn Talbot] starting ...
>> 2016-06-26_09:51:35.32863 [2016-06-26 12:51:35,328][INFO ][transport     
>>            ] [Glenn Talbot] publish_address {172.25.232.45:9300}, 
>> bound_addresses {172.25.232.45:9300}
>> 2016-06-26_09:51:35.33658 [2016-06-26 12:51:35,336][INFO ][discovery     
>>            ] [Glenn Talbot] graylog-production/th7wM-a9ThaAY_umCV3v2w
>> 2016-06-26_09:51:45.37933 [2016-06-26 12:51:45,379][INFO 
>> ][cluster.service          ] [Glenn Talbot] new_master {Glenn 
>> Talbot}{th7wM-a9ThaAY_umCV3v2w}{172.25.232.45}{172.25.232.45:9300}, 
>> added 
>> {{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
>>  
>> data=false, 
>> master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
>>  
>> data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
>> joins received)
>> 2016-06-26_09:51:45.40239 [2016-06-26 12:51:45,402][INFO ][http           
>>           ] [Glenn Talbot] publish_address {172.25.232.45:9200}, 
>> bound_addresses {172.25.232.45:9200}
>> 2016-06-26_09:51:45.40350 [2016-06-26 12:51:45,403][INFO ][node           
>>           ] [Glenn Talbot] started
>> 2016-06-26_09:51:45.53808 [2016-06-26 12:51:45,537][INFO ][gateway       
>>            ] [Glenn Talbot] recovered [1] indices into cluster_state
>> 2016-06-26_09:51:45.87525 [2016-06-26 12:51:45,875][INFO 
>> ][cluster.routing.allocation] [Glenn Talbot] Cluster health status changed 
>> from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
>> 2016-06-26_09:57:01.91281 [2016-06-26 12:57:01,912][INFO 
>> ][cluster.service          ] [Glenn Talbot] added 
>> {{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>  
>> data=false, master=false},}, reason: zen-disco-join(join from 
>> node[{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>  
>> data=false, master=false}])
>> 2016-06-26_10:17:02.43148 [2016-06-26 13:17:02,428][INFO 
>> ][cluster.metadata         ] [Glenn Talbot] [graylog_0] update_mapping 
>> [message]
>> 2016-06-26_15:35:13.25159 [2016-06-26 18:35:13,250][INFO ][node           
>>           ] [Glenn Talbot] stopping ...
>> 2016-06-26_15:35:13.32027 [2016-06-26 18:35:13,319][INFO ][node           
>>           ] [Glenn Talbot] stopped
>> 2016-06-26_15:35:13.32153 [2016-06-26 18:35:13,320][INFO ][node           
>>           ] [Glenn Talbot] closing ...
>> 2016-06-26_15:35:13.33032 [2016-06-26 18:35:13,329][INFO ][node           
>>           ] [Glenn Talbot] closed
>> 2016-06-26_15:46:49.97957 [2016-06-26 18:46:49,977][INFO ][node           
>>           ] [Tether] version[2.3.1], pid[1364], 
>> build[bd98092/2016-04-04T12:25:05Z]
>> 2016-06-26_15:46:49.97959 [2016-06-26 18:46:49,978][INFO ][node           
>>           ] [Tether] initializing ...
>> 2016-06-26_15:46:50.52052 [2016-06-26 18:46:50,519][INFO ][plugins       
>>            ] [Tether] modules [reindex, lang-expression, lang-groovy], 
>> plugins [kopf], sites [kopf]
>> 2016-06-26_15:46:50.54693 [2016-06-26 18:46:50,546][INFO ][env           
>>            ] [Tether] using [1] data paths, mounts [[/ 
>> (/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space 
>> [14.9gb], spins? [possibly], types [ext4]
>> 2016-06-26_15:46:50.54734 [2016-06-26 18:46:50,546][INFO ][env           
>>            ] [Tether] heap size [37.6gb], compressed ordinary object 
>> pointers [false]
>> 2016-06-26_15:46:50.54871 [2016-06-26 18:46:50,547][WARN ][env           
>>            ] [Tether] max file descriptors [64000] for elasticsearch 
>> process likely too low, consider increasing to at least [65536]
>> 2016-06-26_15:46:53.00370 [2016-06-26 18:46:53,003][INFO ][node           
>>           ] [Tether] initialized
>> 2016-06-26_15:46:53.00560 [2016-06-26 18:46:53,003][INFO ][node           
>>           ] [Tether] starting ...
>> 2016-06-26_15:46:54.29760 [2016-06-26 18:46:54,297][INFO ][transport     
>>            ] [Tether] publish_address {172.25.232.45:9300}, 
>> bound_addresses {172.25.232.45:9300}
>> 2016-06-26_15:46:54.30807 [2016-06-26 18:46:54,307][INFO ][discovery     
>>            ] [Tether] graylog-production/kMz-P-dcQZCObsEMIXZtxQ
>> 2016-06-26_15:47:04.35293 [2016-06-26 18:47:04,352][INFO 
>> ][cluster.service          ] [Tether] new_master 
>> {Tether}{kMz-P-dcQZCObsEMIXZtxQ}{172.25.232.45}{172.25.232.45:9300}, 
>> added 
>> {{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{172.25.232.41:9350}{client=true,
>>  
>> data=false, 
>> master=false},{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>  
>> data=false, 
>> master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{172.25.232.35:9350}{client=true,
>>  
>> data=false, master=false},}, reason: zen-disco-join(elected_as_master, [0] 
>> joins received)
>> 2016-06-26_15:47:04.36292 [2016-06-26 
>> 18:47:04,362][DEBUG][action.admin.cluster.state] [Tether] no known master 
>> node, scheduling a retry
>> 2016-06-26_15:47:04.36567 [2016-06-26 
>> 18:47:04,364][DEBUG][action.admin.cluster.health] [Tether] no known master 
>> node, scheduling a retry
>> 2016-06-26_15:47:04.38578 [2016-06-26 18:47:04,385][INFO ][http           
>>           ] [Tether] publish_address {172.25.232.45:9200}, 
>> bound_addresses {172.25.232.45:9200}
>> 2016-06-26_15:47:04.38617 [2016-06-26 18:47:04,385][INFO ][node           
>>           ] [Tether] started
>> 2016-06-26_15:47:04.42204 [2016-06-26 18:47:04,421][INFO ][gateway       
>>            ] [Tether] recovered [1] indices into cluster_state
>> 2016-06-26_15:47:04.76029 [2016-06-26 18:47:04,759][INFO 
>> ][cluster.routing.allocation] [Tether] Cluster health status changed from 
>> [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
>>
>>
>>
>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:53:20 UTC+3, מאת Marius Sturm:
>>>
>>> Hi,
>>> this all boils down to an unstable Elasticsearch instance. When Graylog 
>>> is not able to forward log messages to ES it buffers them on disk and tries 
>>> to send them later. This is called journal.
>>> So when your ES service is not running properly the journal fills up 
>>> with messages. Please take a look into the ES logs to figure out why it has 
>>> problems with message ingestion. You can find them in 
>>> /var/log/graylog/elasticsearch/current
>>>
>>> Cheers,
>>> Marius
>>>
>>>
>>> On 27 June 2016 at 13:39, John <[email protected]> wrote:
>>>
>>>> 1 and 4
>>>> and the graylog server node is not sending data to elasticsearch
>>>> I deleted the journal but it doesn't help
>>>> the problems began few days after I upgraded from 1.3 to 2.0.2
>>>>
>>>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:30:28 UTC+3, מאת Joe K:
>>>>
>>>>> Which problem out of 4?
>>>>>
>>>>>
>>>>> On Monday, June 27, 2016 at 2:00:14 PM UTC+3, John wrote:
>>>>>>
>>>>>> Hi Joe
>>>>>> I have exactly the same problem few days after I upgraded from 1.3 to 
>>>>>> 2.0.2
>>>>>> Did you managed to fix this issue?
>>>>>>
>>>>>> בתאריך יום חמישי, 26 במאי 2016 בשעה 14:02:19 UTC+3, מאת Joe K:
>>>>>>>
>>>>>>>
>>>>>>> - We run it on t2.medium. (4GB RAM, 2 cores)
>>>>>>> - About 1 incoming message per second.
>>>>>>> - tried 2.0.0 and now running 2.0.1
>>>>>>>
>>>>>>> Anyone use Image in real world application? Graylog 2.0 image fails 
>>>>>>> after few days. Is this Image problem or Graylog in general?
>>>>>>>
>>>>>>> It runs fine for about a week. After that there's errors and search 
>>>>>>> stop working. Search requests timeout.
>>>>>>> There's many errors and they are very cryptic, google search does 
>>>>>>> not give any solutions how to manage them:
>>>>>>>
>>>>>>>
>>>>>>> *1. After about a week we have error "Uncommited messages deleted 
>>>>>>> from journal"*
>>>>>>>
>>>>>>>> Uncommited messages deleted from journal (triggered 9 days ago)
>>>>>>>> Some messages were deleted from the Graylog journal before they 
>>>>>>>> could be written to Elasticsearch. Please verify that your 
>>>>>>>> Elasticsearch 
>>>>>>>> cluster is healthy and fast enough. You may also want to review your 
>>>>>>>> Graylog journal settings and set a higher limit. (Node: f12...
>>>>>>>
>>>>>>>
>>>>>>> What to do about this? What is "journal"? Google search produce no 
>>>>>>> answers.
>>>>>>>
>>>>>>> *2. After about 4 days of clean install it always trigger "Cluster 
>>>>>>> unhealthy"*
>>>>>>>
>>>>>>>>  "Elasticsearch cluster unhealthy (RED)"
>>>>>>>> "The Elasticsearch cluster state is RED which means shards are 
>>>>>>>> unassigned. This usually indicates a crashed and corrupt cluster and 
>>>>>>>> needs 
>>>>>>>> to be investigated. Graylog will write into the local disk journal. 
>>>>>>>> Read 
>>>>>>>> how to fix this in the Elasticsearch setup documentation."
>>>>>>>
>>>>>>>
>>>>>>> When you go to that documentation link it says "The red status 
>>>>>>> indicates that some or all of the primary shards are not available. In 
>>>>>>> this 
>>>>>>> state, no searches can be performed until all primary shards are 
>>>>>>> restored."
>>>>>>> That's it. what are you supposed to do?
>>>>>>> After long search finally found one solution: this was cured once 
>>>>>>> with *curl -XPUT 'localhost:9200/_settings' -d '{ "index" : {       
>>>>>>>  "number_of_replicas" : 0}}'*
>>>>>>> Next time it happened, we tried the solution again, but response was 
>>>>>>> *{"acknowledged":false}*
>>>>>>> So what now???
>>>>>>>
>>>>>>> *3. Every time we perform graylog-ctl restart four more unassigled 
>>>>>>> shards appear:*
>>>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing,
>>>>>>>  0 relocating, 8 unassigned
>>>>>>> graylog-ctl restart
>>>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0 initializing,
>>>>>>>  0 relocating, 12 unassigned
>>>>>>> Etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *4. Journal utilization is too high without any hint on how to set 
>>>>>>> it to higher.*
>>>>>>>
>>>>>>>>  Journal utilization is too high (triggered 11 days ago)
>>>>>>>> Journal utilization is too high and may go over the limit soon. 
>>>>>>>> Please verify that your Elasticsearch cluster is healthy and fast 
>>>>>>>> enough. 
>>>>>>>> You may also want to review your Graylog journal settings and set a 
>>>>>>>> higher 
>>>>>>>> limit. (Node: f121
>>>>>>>
>>>>>>>
>>>>>>> What is this "journal"? and how to set it to "higher"?
>>>>>>>
>>>>>>> Please help!
>>>>>>>
>>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Graylog Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Developer
>>>
>>> Tel.: +49 (0)40 609 452 077
>>> Fax.: +49 (0)40 609 452 078
>>>
>>> TORCH GmbH - A Graylog Company
>>> Poolstraße 21
>>> 20335 Hamburg
>>> Germany
>>>
>>> https://www.graylog.com <https://www.torch.sh/>
>>>
>>> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
>>> Geschäftsführer: Lennart Koopmann (CEO)
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/09e5d6fd-cae5-4088-94d7-f41e44b04e58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

Reply via email to