Re: [akka-user] akka system "shut down" with no error in logs

Akka Team Fri, 31 Oct 2014 02:09:50 -0700

Hi Oren,

On Thu, Oct 30, 2014 at 12:24 PM, Oren Razon <[email protected]> wrote:


> Hi Endre
> We did used the PinnedDispacher for the potential "blockers" but it still
> occurred.
> We use only log printing with the scheduling task (we didn't send any
> message), and it stopped.
> I don't think it's a scheduler specific bug cause it happen only when the
> BIG shutting down phenomena occur.
>

Basically all of the internal systems in Akka rely on scheduling:
- all timeouts need the scheduler
- all heartbeat timers need the scheduler

If the scheduler is borked then there is a high likeliness that all hell
breaks loose. It does not explain stopped logging of other things though.


> What do you mean by "amazon virtual machines mess up CPU timers which is
> then an Amazon issue"?
>

What I meant that we had the experience with some cloud provider that in
certain cases (probably due to image migration) the low-level CPU clock
jumped backwards which basically breaks monotonicity of System.nanoTime()
among others, which breaks the underlying assumption of the scheduler and
leads to unspecified behavior.

I am not sure if this is the case here. Unfortunately without access to
your code, thread/memory dumps I cannot figure out anything more. You might
want to consider Production support which covers these cases.

-Endre


>
> On Thursday, October 30, 2014 12:04:41 PM UTC+2, Akka Team wrote:
>>
>> Hi Oren,
>>
>>
>>
>> 1. We changed our internal scheduler to send heartbeat directly to log
>>> instead of sending heartbeat as message
>>> 2. We change our first in line actor to use PinnedDispacher instead of
>>> the default one.
>>>
>>
>> It should be the other way around, the actors doing the blocking calls
>> should be on the PinnedDispatcher and the rest can run on the default one.
>> If the blocking actors share the dispatcher with clustering for example it
>> will just die.
>>
>>
>>>
>>> And yet a...It happened again, and when do we didn't not see any
>>> heartbeat logs anymore which indicate it is not dispatcher issue.
>>>
>>
>> If you use a scheduled task that does not send messages to actors, but
>> for example prints to stdout, and still stops, then there is a different
>> issue lurking there. If this is the case, and you verified it, then this is
>> either a bug in the scheduler, or amazon virtual machines mess up CPU
>> timers which is then an Amazon issue.
>>
>> -Endre
>>
>>
>>>
>>> Any idea?
>>>
>>>
>>> On Monday, October 27, 2014 12:38:27 PM UTC+2, Akka Team wrote:
>>>>
>>>> Hi Oren,
>>>>
>>>>
>>>>
>>>> On Mon, Oct 27, 2014 at 7:22 AM, Oren Razon <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> I'm using akka 2.3.6 on a single node on Amazon m1.large instance.
>>>>> I'm using akka system as a cluster (see my configuration file below).
>>>>> My akka system include several parallel identical topologies.
>>>>> Each topology start with an actor that act as a MQTT client at front
>>>>> which send arrived massages after processing them to some other actors
>>>>> (some of them written in scala and some in java).
>>>>> When starting up the system everything seem to work well.
>>>>> But after a couple of hours (usually ~8-10 hours) it seem that the
>>>>> MQTT actor (the first in front) is still getting it's messages and work as
>>>>> expected, but when sending them to its followers nothing happened. I do 
>>>>> not
>>>>> see dead letters, or any logging (applicative \ akka) that indicate
>>>>> something regarding the actors (all heartbeats are stopped).
>>>>> When looking in all my logs at debug level it seem that there was no
>>>>> error at all, and that this scenario happen to all my topologies at the
>>>>> exact same moment.
>>>>>
>>>>> Trying to debug it, we added an internal scheduler into the second
>>>>> actor in line (the one that get messages from the MQTT actor) which send a
>>>>> "Tick" message to himself every minute and print it.
>>>>> As long as the system work well we see these "tick" messages, but when
>>>>> the system is "down" we do not see anymore "tick" messages.
>>>>>
>>>>>
>>>> This suspiciously looks like the common case when a dispatcher becomes
>>>> stalled because blocking calls are consuming up all the threads in the
>>>> underlying thread-pool. Do you have any blocking calls, or long-running
>>>> computations in any of the receive blocks of your actors? Do you use
>>>> Await.result()?
>>>>
>>>> -Endre
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/c
>>>>> urrent/additional/faq.html
>>>>> >>>>>>>>>> Search the archives: https://groups.google.com/grou
>>>>> p/akka-user
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Akka User List" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/akka-user.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Akka Team
>>>> Typesafe - The software stack for applications that scale
>>>> Blog: letitcrash.com
>>>> Twitter: @akkateam
>>>>
>>>  --
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
>>> current/additional/faq.html
>>> >>>>>>>>>> Search the archives: https://groups.google.com/
>>> group/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Akka Team
>> Typesafe - The software stack for applications that scale
>> Blog: letitcrash.com
>> Twitter: @akkateam
>>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Akka Team
Typesafe - The software stack for applications that scale
Blog: letitcrash.com
Twitter: @akkateam

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] akka system "shut down" with no error in logs

Reply via email to