Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

Marius Sturm Tue, 28 Jun 2016 05:59:50 -0700

Messages have to be deleted by hand, always look at the timestamp if there
are some errors. Maybe they are outdated long ago.


On 28 June 2016 at 14:38, John <[email protected]> wrote:

> Hi,
> I fixed this issue by creating a new input with the same parameters and
> only a different listening port
> Now everything works well
>  .And finally I deleted manually the 2 alerts "Uncommited messages ...."
> and "journal utilization too high..."  because it seams that these error
> messages don't disappear automatically
> So I don't understand why the first input doesn't work anymore. It
> receives the messages but it doesn't send them to elasticsearch
>
> בתאריך יום שני, 27 ביוני 2016 בשעה 16:49:35 UTC+3, מאת John:
>
>> screenshots from my UI
>>
>>
>> <https://lh3.googleusercontent.com/-tEVtJ8knQ8c/V3Euz0SdodI/AAAAAAAAfws/_kr7G20PxvE3R4WOK0AIHDKRYDGqbMwrQCLcB/s1600/journal_too_high.png>
>>
>>
>> <https://lh3.googleusercontent.com/-l6rI9eoCEb8/V3Eu2hMMM5I/AAAAAAAAfw0/iiZfY6RxadUhqGDTKr93GBqxR7wmqhN0wCLcB/s1600/node_details.png>
>>
>>
>> בתאריך יום שני, 27 ביוני 2016 בשעה 15:32:52 UTC+3, מאת John:
>>>
>>> Hi
>>> I checked the elasticsearch log and I don't see something special
>>> The cluster status is green
>>>
>>> this is the last log file
>>>
>>> 2016-06-26_09:51:28.78352 [2016-06-26 12:51:28,782][INFO ][node
>>>             ] [Glenn Talbot] version[2.3.1], pid[953],
>>> build[bd98092/2016-04-04T12:25:05Z]
>>> 2016-06-26_09:51:28.79783 [2016-06-26 12:51:28,794][INFO ][node
>>>             ] [Glenn Talbot] initializing ...
>>> 2016-06-26_09:51:30.17146 [2016-06-26 12:51:30,171][INFO ][plugins
>>>            ] [Glenn Talbot] modules [reindex, lang-expression,
>>> lang-groovy], plugins [kopf], sites [kopf]
>>> 2016-06-26_09:51:30.29289 [2016-06-26 12:51:30,292][INFO ][env
>>>            ] [Glenn Talbot] using [1] data paths, mounts [[/
>>> (/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space
>>> [14.9gb], spins? [possibly], types [ext4]
>>> 2016-06-26_09:51:30.29564 [2016-06-26 12:51:30,294][INFO ][env
>>>            ] [Glenn Talbot] heap size [37.6gb], compressed ordinary object
>>> pointers [false]
>>> 2016-06-26_09:51:30.29766 [2016-06-26 12:51:30,294][WARN ][env
>>>            ] [Glenn Talbot] max file descriptors [64000] for elasticsearch
>>> process likely too low, consider increasing to at least [65536]
>>> 2016-06-26_09:51:34.69050 [2016-06-26 12:51:34,690][INFO ][node
>>>             ] [Glenn Talbot] initialized
>>> 2016-06-26_09:51:34.69107 [2016-06-26 12:51:34,690][INFO ][node
>>>             ] [Glenn Talbot] starting ...
>>> 2016-06-26_09:51:35.32863 [2016-06-26 12:51:35,328][INFO ][transport
>>>            ] [Glenn Talbot] publish_address {172.25.232.45:9300},
>>> bound_addresses {172.25.232.45:9300}
>>> 2016-06-26_09:51:35.33658 [2016-06-26 12:51:35,336][INFO ][discovery
>>>            ] [Glenn Talbot] graylog-production/th7wM-a9ThaAY_umCV3v2w
>>> 2016-06-26_09:51:45.37933 [2016-06-26 12:51:45,379][INFO
>>> ][cluster.service          ] [Glenn Talbot] new_master {Glenn
>>> Talbot}{th7wM-a9ThaAY_umCV3v2w}{172.25.232.45}{172.25.232.45:9300},
>>> added
>>> {{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{
>>> 172.25.232.41:9350}{client=true, data=false,
>>> master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{
>>> 172.25.232.35:9350}{client=true, data=false, master=false},}, reason:
>>> zen-disco-join(elected_as_master, [0] joins received)
>>> 2016-06-26_09:51:45.40239 [2016-06-26 12:51:45,402][INFO ][http
>>>             ] [Glenn Talbot] publish_address {172.25.232.45:9200},
>>> bound_addresses {172.25.232.45:9200}
>>> 2016-06-26_09:51:45.40350 [2016-06-26 12:51:45,403][INFO ][node
>>>             ] [Glenn Talbot] started
>>> 2016-06-26_09:51:45.53808 [2016-06-26 12:51:45,537][INFO ][gateway
>>>            ] [Glenn Talbot] recovered [1] indices into cluster_state
>>> 2016-06-26_09:51:45.87525 [2016-06-26 12:51:45,875][INFO
>>> ][cluster.routing.allocation] [Glenn Talbot] Cluster health status changed
>>> from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
>>> 2016-06-26_09:57:01.91281 [2016-06-26 12:57:01,912][INFO
>>> ][cluster.service          ] [Glenn Talbot] added
>>> {{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>> data=false, master=false},}, reason: zen-disco-join(join from
>>> node[{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>> data=false, master=false}])
>>> 2016-06-26_10:17:02.43148 [2016-06-26 13:17:02,428][INFO
>>> ][cluster.metadata         ] [Glenn Talbot] [graylog_0] update_mapping
>>> [message]
>>> 2016-06-26_15:35:13.25159 [2016-06-26 18:35:13,250][INFO ][node
>>>             ] [Glenn Talbot] stopping ...
>>> 2016-06-26_15:35:13.32027 [2016-06-26 18:35:13,319][INFO ][node
>>>             ] [Glenn Talbot] stopped
>>> 2016-06-26_15:35:13.32153 [2016-06-26 18:35:13,320][INFO ][node
>>>             ] [Glenn Talbot] closing ...
>>> 2016-06-26_15:35:13.33032 [2016-06-26 18:35:13,329][INFO ][node
>>>             ] [Glenn Talbot] closed
>>> 2016-06-26_15:46:49.97957 [2016-06-26 18:46:49,977][INFO ][node
>>>             ] [Tether] version[2.3.1], pid[1364],
>>> build[bd98092/2016-04-04T12:25:05Z]
>>> 2016-06-26_15:46:49.97959 [2016-06-26 18:46:49,978][INFO ][node
>>>             ] [Tether] initializing ...
>>> 2016-06-26_15:46:50.52052 [2016-06-26 18:46:50,519][INFO ][plugins
>>>            ] [Tether] modules [reindex, lang-expression, lang-groovy],
>>> plugins [kopf], sites [kopf]
>>> 2016-06-26_15:46:50.54693 [2016-06-26 18:46:50,546][INFO ][env
>>>            ] [Tether] using [1] data paths, mounts [[/
>>> (/dev/mapper/graylog--vg-root)]], net usable_space [11gb], net total_space
>>> [14.9gb], spins? [possibly], types [ext4]
>>> 2016-06-26_15:46:50.54734 [2016-06-26 18:46:50,546][INFO ][env
>>>            ] [Tether] heap size [37.6gb], compressed ordinary object
>>> pointers [false]
>>> 2016-06-26_15:46:50.54871 [2016-06-26 18:46:50,547][WARN ][env
>>>            ] [Tether] max file descriptors [64000] for elasticsearch
>>> process likely too low, consider increasing to at least [65536]
>>> 2016-06-26_15:46:53.00370 [2016-06-26 18:46:53,003][INFO ][node
>>>             ] [Tether] initialized
>>> 2016-06-26_15:46:53.00560 [2016-06-26 18:46:53,003][INFO ][node
>>>             ] [Tether] starting ...
>>> 2016-06-26_15:46:54.29760 [2016-06-26 18:46:54,297][INFO ][transport
>>>            ] [Tether] publish_address {172.25.232.45:9300},
>>> bound_addresses {172.25.232.45:9300}
>>> 2016-06-26_15:46:54.30807 [2016-06-26 18:46:54,307][INFO ][discovery
>>>            ] [Tether] graylog-production/kMz-P-dcQZCObsEMIXZtxQ
>>> 2016-06-26_15:47:04.35293 [2016-06-26 18:47:04,352][INFO
>>> ][cluster.service          ] [Tether] new_master
>>> {Tether}{kMz-P-dcQZCObsEMIXZtxQ}{172.25.232.45}{172.25.232.45:9300},
>>> added
>>> {{graylog-a0b12869-11ed-4d89-ae58-dcc7380bc3b8}{KA4cjlTpQTm9Y1Rv5wlVmw}{172.25.232.41}{
>>> 172.25.232.41:9350}{client=true, data=false,
>>> master=false},{graylog-c1be9fdd-8c8a-41b1-8a2f-dacbddbc0cc5}{-7icx5UPSrWbs9jqXVE2Mg}{172.25.232.36}{172.25.232.36:9350}{client=true,
>>> data=false,
>>> master=false},{graylog-2a340000-d1ba-4f21-a9df-f45901d845b7}{BiWe2Zy2Syaojr9ek0AlJQ}{172.25.232.35}{
>>> 172.25.232.35:9350}{client=true, data=false, master=false},}, reason:
>>> zen-disco-join(elected_as_master, [0] joins received)
>>> 2016-06-26_15:47:04.36292 [2016-06-26
>>> 18:47:04,362][DEBUG][action.admin.cluster.state] [Tether] no known master
>>> node, scheduling a retry
>>> 2016-06-26_15:47:04.36567 [2016-06-26
>>> 18:47:04,364][DEBUG][action.admin.cluster.health] [Tether] no known master
>>> node, scheduling a retry
>>> 2016-06-26_15:47:04.38578 [2016-06-26 18:47:04,385][INFO ][http
>>>             ] [Tether] publish_address {172.25.232.45:9200},
>>> bound_addresses {172.25.232.45:9200}
>>> 2016-06-26_15:47:04.38617 [2016-06-26 18:47:04,385][INFO ][node
>>>             ] [Tether] started
>>> 2016-06-26_15:47:04.42204 [2016-06-26 18:47:04,421][INFO ][gateway
>>>            ] [Tether] recovered [1] indices into cluster_state
>>> 2016-06-26_15:47:04.76029 [2016-06-26 18:47:04,759][INFO
>>> ][cluster.routing.allocation] [Tether] Cluster health status changed from
>>> [RED] to [GREEN] (reason: [shards started [[graylog_0][0]] ...]).
>>>
>>>
>>>
>>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:53:20 UTC+3, מאת Marius Sturm:
>>>>
>>>> Hi,
>>>> this all boils down to an unstable Elasticsearch instance. When Graylog
>>>> is not able to forward log messages to ES it buffers them on disk and tries
>>>> to send them later. This is called journal.
>>>> So when your ES service is not running properly the journal fills up
>>>> with messages. Please take a look into the ES logs to figure out why it has
>>>> problems with message ingestion. You can find them in
>>>> /var/log/graylog/elasticsearch/current
>>>>
>>>> Cheers,
>>>> Marius
>>>>
>>>>
>>>> On 27 June 2016 at 13:39, John <[email protected]> wrote:
>>>>
>>>>> 1 and 4
>>>>> and the graylog server node is not sending data to elasticsearch
>>>>> I deleted the journal but it doesn't help
>>>>> the problems began few days after I upgraded from 1.3 to 2.0.2
>>>>>
>>>>> בתאריך יום שני, 27 ביוני 2016 בשעה 14:30:28 UTC+3, מאת Joe K:
>>>>>
>>>>>> Which problem out of 4?
>>>>>>
>>>>>>
>>>>>> On Monday, June 27, 2016 at 2:00:14 PM UTC+3, John wrote:
>>>>>>>
>>>>>>> Hi Joe
>>>>>>> I have exactly the same problem few days after I upgraded from 1.3
>>>>>>> to 2.0.2
>>>>>>> Did you managed to fix this issue?
>>>>>>>
>>>>>>> בתאריך יום חמישי, 26 במאי 2016 בשעה 14:02:19 UTC+3, מאת Joe K:
>>>>>>>>
>>>>>>>>
>>>>>>>> - We run it on t2.medium. (4GB RAM, 2 cores)
>>>>>>>> - About 1 incoming message per second.
>>>>>>>> - tried 2.0.0 and now running 2.0.1
>>>>>>>>
>>>>>>>> Anyone use Image in real world application? Graylog 2.0 image fails
>>>>>>>> after few days. Is this Image problem or Graylog in general?
>>>>>>>>
>>>>>>>> It runs fine for about a week. After that there's errors and search
>>>>>>>> stop working. Search requests timeout.
>>>>>>>> There's many errors and they are very cryptic, google search does
>>>>>>>> not give any solutions how to manage them:
>>>>>>>>
>>>>>>>>
>>>>>>>> *1. After about a week we have error "Uncommited messages deleted
>>>>>>>> from journal"*
>>>>>>>>
>>>>>>>>> Uncommited messages deleted from journal (triggered 9 days ago)
>>>>>>>>> Some messages were deleted from the Graylog journal before they
>>>>>>>>> could be written to Elasticsearch. Please verify that your 
>>>>>>>>> Elasticsearch
>>>>>>>>> cluster is healthy and fast enough. You may also want to review your
>>>>>>>>> Graylog journal settings and set a higher limit. (Node: f12...
>>>>>>>>
>>>>>>>>
>>>>>>>> What to do about this? What is "journal"? Google search produce no
>>>>>>>> answers.
>>>>>>>>
>>>>>>>> *2. After about 4 days of clean install it always trigger "Cluster
>>>>>>>> unhealthy"*
>>>>>>>>
>>>>>>>>>  "Elasticsearch cluster unhealthy (RED)"
>>>>>>>>> "The Elasticsearch cluster state is RED which means shards are
>>>>>>>>> unassigned. This usually indicates a crashed and corrupt cluster and 
>>>>>>>>> needs
>>>>>>>>> to be investigated. Graylog will write into the local disk journal. 
>>>>>>>>> Read
>>>>>>>>> how to fix this in the Elasticsearch setup documentation."
>>>>>>>>
>>>>>>>>
>>>>>>>> When you go to that documentation link it says "The red status
>>>>>>>> indicates that some or all of the primary shards are not available. In 
>>>>>>>> this
>>>>>>>> state, no searches can be performed until all primary shards are 
>>>>>>>> restored."
>>>>>>>> That's it. what are you supposed to do?
>>>>>>>> After long search finally found one solution: this was cured once
>>>>>>>> with *curl -XPUT 'localhost:9200/_settings' -d '{ "index" : {
>>>>>>>>    "number_of_replicas" : 0}}'*
>>>>>>>> Next time it happened, we tried the solution again, but response
>>>>>>>> was *{"acknowledged":false}*
>>>>>>>> So what now???
>>>>>>>>
>>>>>>>> *3. Every time we perform graylog-ctl restart four more unassigled
>>>>>>>> shards appear:*
>>>>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0
>>>>>>>>  initializing, 0 relocating, 8 unassigned
>>>>>>>> graylog-ctl restart
>>>>>>>>  Elasticsearch cluster is yellow. Shards: 20 active, 0
>>>>>>>>  initializing, 0 relocating, 12 unassigned
>>>>>>>> Etc.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *4. Journal utilization is too high without any hint on how to set
>>>>>>>> it to higher.*
>>>>>>>>
>>>>>>>>>  Journal utilization is too high (triggered 11 days ago)
>>>>>>>>> Journal utilization is too high and may go over the limit soon.
>>>>>>>>> Please verify that your Elasticsearch cluster is healthy and fast 
>>>>>>>>> enough.
>>>>>>>>> You may also want to review your Graylog journal settings and set a 
>>>>>>>>> higher
>>>>>>>>> limit. (Node: f121
>>>>>>>>
>>>>>>>>
>>>>>>>> What is this "journal"? and how to set it to "higher"?
>>>>>>>>
>>>>>>>> Please help!
>>>>>>>>
>>>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Graylog Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/graylog2/2288cbf2-6f37-4e77-8c32-c50ba64fe71e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Developer
>>>>
>>>> Tel.: +49 (0)40 609 452 077
>>>> Fax.: +49 (0)40 609 452 078
>>>>
>>>> TORCH GmbH - A Graylog Company
>>>> Poolstraße 21
>>>> 20335 Hamburg
>>>> Germany
>>>>
>>>> https://www.graylog.com <https://www.torch.sh/>
>>>>
>>>> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
>>>> Geschäftsführer: Lennart Koopmann (CEO)
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Graylog Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/graylog2/09e5d6fd-cae5-4088-94d7-f41e44b04e58%40googlegroups.com
> <https://groups.google.com/d/msgid/graylog2/09e5d6fd-cae5-4088-94d7-f41e44b04e58%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Developer

Tel.: +49 (0)40 609 452 077
Fax.: +49 (0)40 609 452 078

TORCH GmbH - A Graylog Company
Poolstraße 21
20335 Hamburg
Germany

https://www.graylog.com <https://www.torch.sh/>

Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
Geschäftsführer: Lennart Koopmann (CEO)

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/CAMqbBbJp2n%2B5DXkRpqwOLVHN0YS6C4rgxPSmkkDt5VnjeYtLeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Re: Anyone use Image in real world application? Graylog 2.0 image fails after few days. Is this Image problem or Graylog in general?

Reply via email to