Re: [akka-user] Re: Many problems with akka-cluster-sharding-scala activator

Patrik Nordwall Fri, 11 Apr 2014 08:19:53 -0700

For the record, the issues listed here were fixed in Akka 2.3.2.

Thanks for your help finding and narrowing down the issues.


Cheers,
Patrik


On Thu, Apr 3, 2014 at 8:42 AM, Patrik Nordwall
<[email protected]>wrote:

>
>
>
> On Wed, Apr 2, 2014 at 10:23 PM, Patrik Nordwall <
> [email protected]> wrote:
>
>>
>>
>>
>> 2 apr 2014 kl. 19:27 skrev Raman Gupta <[email protected]>:
>>
>> I created https://www.assembla.com/spaces/akka/tickets/3975 re #2.
>>
>>
>>
> That problem is in the sample as I have described in the 
> ticket<https://www.assembla.com/spaces/akka/tickets/3975>
> .
> Thanks for trying it out in anger.
>
> /Patrik
>
>
>> Thanks a lot.
>>
>>
>> There is also the one other (relatively minor) issue of the 8-10 dead
>> letters on cluster startup. Do you consider that a bug? If so, I shall
>> create a ticket for that as well.
>>
>> Dead letter logging is not a bug. You can turn that off if it's
>> disturbing.
>>
>> /Patrik
>>
>> Regards,
>> Raman
>>
>> On Wednesday, April 2, 2014 7:47:19 AM UTC-4, Patrik Nordwall wrote:
>>>
>>> Hi Raman and Michael,
>>>
>>> I distilled this to 2 remaining issues:
>>>
>>> 1. NoSuchElementException ClusterSharding.scala:1055
>>> That looks like a bug. Please create a 
>>> ticket<http://doc.akka.io/docs/akka/current/project/issue-tracking.html>with
>>>  description of how to reproduce.
>>>
>>> 2. InvalidActorNameException: actor name must not be empty
>>> ClusterSharding.scala:802
>>> That means that the id is "", which is not meaningful and not supported.
>>> We should add a check and handle it in a better way. Ticket please.
>>>
>>> Have I missed anything else?
>>>
>>> /Patrik
>>>
>>>
>>>
>>> On Tue, Apr 1, 2014 at 4:23 PM, Raman Gupta <[email protected]> wrote:
>>>
>>>> All right, at least I figured out the OOM problem. The sbt packaged
>>>> with Fedora 20 does not set the perm gen size, so it uses the default size
>>>> of 64MB, which is too small for sbt / Akka. That was probably causing a lot
>>>> of my issues. In case anyone cares, I created:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1083130
>>>>
>>>> That took care of a lot of weirdness! There are still issues however.
>>>> Here is another error I found by starting and stopping the 2552 node
>>>> several times, specifically stopping it immediately after a "New post
>>>> saved:" message:
>>>>
>>>> Seen on the bot:
>>>>
>>>> [INFO] [04/01/2014 10:20:58.686] 
>>>> [ClusterSystem-akka.actor.default-dispatcher-22]
>>>> [akka.tcp://[email protected]:56185/user/sharding/
>>>> AuthorListingCoordinator] Member removed [akka.tcp://ClusterSystem@127.
>>>> 0.0.1:2552]
>>>> [ERROR] [04/01/2014 10:21:00.044] 
>>>> [ClusterSystem-akka.actor.default-dispatcher-3]
>>>> [akka://ClusterSystem/user/sharding/AuthorListing] actor name must not
>>>> be empty
>>>> akka.actor.InvalidActorNameException: actor name must not be empty
>>>>         at akka.actor.dungeon.Children$class.checkName(Children.
>>>> scala:180)
>>>>         at akka.actor.dungeon.Children$class.actorOf(Children.scala:38)
>>>>         at akka.actor.ActorCell.actorOf(ActorCell.scala:369)
>>>>         at akka.contrib.pattern.ShardRegion$$anonfun$6.apply(
>>>> ClusterSharding.scala:802)
>>>>         at akka.contrib.pattern.ShardRegion$$anonfun$6.apply(
>>>> ClusterSharding.scala:798)
>>>>         at scala.Option.getOrElse(Option.scala:120)
>>>>         at akka.contrib.pattern.ShardRegion.deliverMessage(
>>>> ClusterSharding.scala:798)
>>>>         at akka.contrib.pattern.ShardRegion$$anonfun$
>>>> receiveCoordinatorMessage$2.apply(ClusterSharding.scala:694)
>>>>          at akka.contrib.pattern.ShardRegion$$anonfun$
>>>> receiveCoordinatorMessage$2.apply(ClusterSharding.scala:693)
>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>         at scala.collection.AbstractIterator.foreach(
>>>> Iterator.scala:1157)
>>>>         at scala.collection.IterableLike$class.foreach(IterableLike.
>>>> scala:72)
>>>>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>>>         at akka.contrib.pattern.ShardRegion.receiveCoordinatorMessage(
>>>> ClusterSharding.scala:693)
>>>>         at akka.contrib.pattern.ShardRegion$$anonfun$receive$
>>>> 3.applyOrElse(ClusterSharding.scala:656)
>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>         at akka.contrib.pattern.ShardRegion.aroundReceive(
>>>> ClusterSharding.scala:586)
>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>>>         at akka.dispatch.ForkJoinExecutorConfigurator$
>>>> AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>>>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(
>>>> ForkJoinTask.java:260)
>>>>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
>>>> runTask(ForkJoinPool.java:1339)
>>>>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>>>> ForkJoinPool.java:1979)
>>>>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>>>> ForkJoinWorkerThread.java:107)
>>>>
>>>>
>>>> Regards,
>>>> Raman
>>>>
>>>>
>>>> On Tuesday, April 1, 2014 4:07:35 AM UTC-4, delasoul wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> (I have tested with Akka 2.3.0, akka-persistence-mongo-casbah
>>>>> 0.4-SNAPSHOT and running the BlogApp in different processes)
>>>>>
>>>>> I can confirm
>>>>>
>>>>> points 1 and 4, but these have no influence on how the application
>>>>> works. The dead-letter and gating messages are always coming when starting
>>>>> the first
>>>>> seed node but everything is working fine when the other nodes join the
>>>>> cluster.
>>>>> I don't see duplicate key exceptions or OOM, but when stopping the
>>>>> first seed node, after a while the remaining nodes start to fail with
>>>>> a NoSuchElementException for every shard lookup  (see exception log at
>>>>> the end of the post).
>>>>> As said this only happens when stopping the first seed node, if I stop
>>>>> the second seed node or other BlogApps I started with port 0 and then
>>>>> restart them in various order everything works fine.
>>>>>
>>>>> hth,
>>>>>
>>>>> michael
>>>>>
>>>>> [ERROR] [04/01/2014 09:37:46.131] 
>>>>> [ClusterSystem-akka.actor.default-dispatcher-16]
>>>>> [akka:/
>>>>> /ClusterSystem/user/sharding/AuthorListingCoordinator/singleton] key
>>>>> not found: Actor[akka
>>>>> ://ClusterSystem/user/sharding/AuthorListing#-1841768893]
>>>>> java.util.NoSuchElementException: key not found:
>>>>> Actor[akka://ClusterSystem/user/sharding/
>>>>> AuthorListing#-1841768893]
>>>>>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>>>>>         at scala.collection.AbstractMap.default(Map.scala:58)
>>>>>         at scala.collection.MapLike$class.apply(MapLike.scala:141)
>>>>>         at scala.collection.AbstractMap.apply(Map.scala:58)
>>>>>         at akka.contrib.pattern.ShardCoordinator$Internal$State.
>>>>> updated(ClusterSharding.sc
>>>>> ala:1055)
>>>>>         at akka.contrib.pattern.ShardCoordinator$$anonfun$
>>>>> receiveRecover$1.applyOrElse(Clu
>>>>> sterSharding.scala:1162)
>>>>>         at scala.runtime.AbstractPartialFunction$mcVL$
>>>>> sp.apply$mcVL$sp(AbstractPartialFunc
>>>>> tion.scala:33)
>>>>>         at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(
>>>>> AbstractPartialFunction.sca
>>>>> la:33)
>>>>>         at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(
>>>>> AbstractPartialFunction.sca
>>>>> la:25)
>>>>>         at akka.persistence.Eventsourced$$anonfun$1.applyOrElse(
>>>>> Eventsourced.scala:196)
>>>>>         at akka.persistence.Recovery$State$$anonfun$processPersistent$
>>>>> 1.apply(Recovery.sca
>>>>> la:31)
>>>>>         at akka.persistence.Recovery$State$$anonfun$processPersistent$
>>>>> 1.apply(Recovery.sca
>>>>> la:31)
>>>>>         at akka.persistence.Recovery$State$class.
>>>>> withCurrentPersistent(Recovery.scala:42)
>>>>>         at akka.persistence.Recovery$$anon$1.withCurrentPersistent(
>>>>> Recovery.scala:105)
>>>>>         at akka.persistence.Recovery$State$class.processPersistent(
>>>>> Recovery.scala:31)
>>>>>         at akka.persistence.Recovery$$anon$1.processPersistent(
>>>>> Recovery.scala:105)
>>>>>         at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.
>>>>> scala:110)
>>>>>         at akka.persistence.Recovery$class.aroundReceive(Recovery.
>>>>> scala:242)
>>>>>         at akka.contrib.pattern.ShardCoordinator.akka$persistence$
>>>>> Eventsourced$$super$arou
>>>>> ndReceive(ClusterSharding.scala:1132)
>>>>>         at akka.persistence.Eventsourced$$anon$1.aroundReceive(
>>>>> Eventsourced.scala:29)
>>>>>         at akka.persistence.Eventsourced$class.aroundReceive(
>>>>> Eventsourced.scala:172)
>>>>>         at akka.contrib.pattern.ShardCoordinator.aroundReceive(
>>>>> ClusterSharding.scala:1132)
>>>>>
>>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>>>>         at akka.dispatch.ForkJoinExecutorConfigurator$
>>>>> AkkaForkJoinTask.exec(AbstractDispat
>>>>> cher.scala:393)
>>>>>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.
>>>>> java:260)
>>>>>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>>>>> ForkJoinPool.java:1339
>>>>> )
>>>>>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>>>>> ForkJoinPool.java:1979)
>>>>>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>>>>> ForkJoinWorkerThread.java:10
>>>>> 7)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Monday, 31 March 2014 16:10:44 UTC+2, Raman Gupta wrote:
>>>>>>
>>>>>> I am experimenting with the cluster sharding activator, but am having
>>>>>> lots of issues with it. I have tried updating the activator to 2.3.1, but
>>>>>> to no avail (and other issues show up, such as described here:
>>>>>> https://www.assembla.com/spaces/akka/simple_planner#/ticket:3967).
>>>>>>
>>>>>> Problems noticed so far:
>>>>>>
>>>>>> 1) 100% of the time, the activator sends a lot of messages to the
>>>>>> ClusterSystem deadLetters on startup of the seed node. Here is one 
>>>>>> example:
>>>>>>
>>>>>> [INFO] [03/31/2014 09:37:00.654] 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-2]
>>>>>> [akka://ClusterSystem/deadLetters] Message [akka.cluster.
>>>>>> InternalClusterAction$InitJoin$] from Actor[akka://ClusterSystem/sys
>>>>>> tem/cluster/core/daemon/firstSeedNodeProcess#-438400827] to
>>>>>> Actor[akka://ClusterSystem/deadLetters] was not delivered. [1] dead
>>>>>> letters encountered. This logging can be turned off or adjusted with
>>>>>> configuration settings 'akka.log-dead-letters' and
>>>>>> 'akka.log-dead-letters-during-shutdown'.
>>>>>> [ ... many more akka.cluster.InternalClusterAction$InitJoin$
>>>>>> messages ... ]
>>>>>> [INFO] [03/31/2014 09:37:05.518] 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-14]
>>>>>> [akka://ClusterSystem/system/cluster/core/daemon/firstSeedNodeProcess]
>>>>>> Message [akka.dispatch.sysmsg.Terminate] from
>>>>>> Actor[akka://ClusterSystem/system/cluster/core/daemon/firstS
>>>>>> eedNodeProcess#-438400827] to Actor[akka://ClusterSystem/sys
>>>>>> tem/cluster/core/daemon/firstSeedNodeProcess#-438400827] was not
>>>>>> delivered. [6] dead letters encountered. This logging can be turned off 
>>>>>> or
>>>>>> adjusted with configuration settings 'akka.log-dead-letters' and
>>>>>> 'akka.log-dead-letters-during-shutdown'.
>>>>>> [... JOINING and Up message...]
>>>>>> [INFO] [03/31/2014 09:37:06.516] 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-16]
>>>>>> [akka://ClusterSystem/user/sharding/AuthorListingCoordinator/singleton]
>>>>>> Message [akka.contrib.pattern.ShardCoordinator$Internal$Register]
>>>>>> from Actor[akka://ClusterSystem/user/sharding/AuthorListing#1471529820]
>>>>>> to 
>>>>>> Actor[akka://ClusterSystem/user/sharding/AuthorListingCoordinator/singleton]
>>>>>> was not delivered. [7] dead letters encountered. This logging can be 
>>>>>> turned
>>>>>> off or adjusted with configuration settings 'akka.log-dead-letters' and
>>>>>> 'akka.log-dead-letters-during-shutdown'.
>>>>>> [INFO] [03/31/2014 09:37:06.516] 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-16]
>>>>>> [akka://ClusterSystem/user/sharding/PostCoordinator/singleton]
>>>>>> Message [akka.contrib.pattern.ShardCoordinator$Internal$Register]
>>>>>> from Actor[akka://ClusterSystem/user/sharding/Post#589187748] to
>>>>>> Actor[akka://ClusterSystem/user/sharding/PostCoordinator/singleton]
>>>>>> was not delivered. [8] dead letters encountered. This logging can be 
>>>>>> turned
>>>>>> off or adjusted with configuration settings 'akka.log-dead-letters' and
>>>>>> 'akka.log-dead-letters-during-shutdown'.
>>>>>>
>>>>>> 2) Using the default shared LevelDB journal configuration, sometimes
>>>>>> (but not always) when the Bot node is started, the seed node goes nuts:
>>>>>>
>>>>>> [INFO] [03/31/2014 09:46:00.768] 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-3]
>>>>>> [Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://
>>>>>> [email protected]:2551] - Leader is moving node [akka.tcp://
>>>>>> [email protected]:50327] to [Up]
>>>>>> Uncaught error from thread 
>>>>>> [ClusterSystem-akka.remote.default-remote-dispatcher-24]
>>>>>> shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for
>>>>>> ActorSystem[ClusterSystem]
>>>>>> Uncaught error from thread 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-17]
>>>>>> shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for
>>>>>> ActorSystem[ClusterSystem]
>>>>>> Uncaught error from thread 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-28]
>>>>>> shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for
>>>>>> ActorSystem[ClusterSystem]
>>>>>> Uncaught error from thread 
>>>>>> [ClusterSystem-akka.actor.default-dispatcher-29]
>>>>>> shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for
>>>>>> ActorSystem[ClusterSystem]
>>>>>> [...keeps going forever...]
>>>>>> ^CJava HotSpot(TM) 64-Bit Server VM warning: Exception
>>>>>> java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler-
>>>>>> the VM may need to be forcibly terminated
>>>>>>
>>>>>> 3) When it is working, the shared leveldb journal seems to work
>>>>>> reasonably well (except for the SPOF on the first node). However, when I
>>>>>> change to either one of the MongoDB replicated journals in contrib, when
>>>>>> testing various combinations of node failures, things go nuts with
>>>>>> duplicatekeyexceptions (looping infinitely), OutOfMemoryError's, and 
>>>>>> other
>>>>>> weirdness. I know these are early implementations but the similarity of 
>>>>>> the
>>>>>> failures when using the two different journal implementations makes me
>>>>>> think the problems may not be with the journal implementations, but with
>>>>>> akka-persistence instead.
>>>>>>
>>>>>> 4) When restarting the Bot node, there are lots of WARNings about
>>>>>> unknown UIDs (the following message keeps repeating for Bots that have 
>>>>>> been
>>>>>> shut down -- i.e. the node never appears to be actually removed from the
>>>>>> cluster, even after the entire cluster is restarted):
>>>>>>
>>>>>> [WARN] [03/31/2014 10:01:40.280] 
>>>>>> [ClusterSystem-akka.remote.default-remote-dispatcher-5]
>>>>>> [Remoting] Association to [akka.tcp://[email protected]:50327]
>>>>>> with unknown UID is reported as quarantined, but address cannot be
>>>>>> quarantined without knowing the UID, gating instead for 5000 ms.
>>>>>>
>>>>>> Has anyone else done any experimentation with akka-cluster-sharding?
>>>>>>
>>>>>> Regards,
>>>>>> Raman
>>>>>>
>>>>>>  --
>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
>>>> current/additional/faq.html
>>>> >>>>>>>>>> Search the archives: https://groups.google.com/
>>>> group/akka-user
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Akka User List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/akka-user.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Patrik Nordwall
>>> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
>>> Twitter: @patriknw
>>> JOIN US. REGISTER TODAY! <http://www.scaladays.org/>
>>> Scala <http://www.scaladays.org/>
>>> Days <http://www.scaladays.org/>
>>> June 16th-18th, <http://www.scaladays.org/>
>>> Berlin <http://www.scaladays.org/>
>>>
>>>   --
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ:
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
>
> --
>
> Patrik Nordwall
> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
> Twitter: @patriknw
> JOIN US. REGISTER TODAY! <http://www.scaladays.org/>
> Scala <http://www.scaladays.org/>
> Days <http://www.scaladays.org/>
> June 16th-18th, <http://www.scaladays.org/>
> Berlin <http://www.scaladays.org/>
>
>


-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw
JOIN US. REGISTER TODAY! <http://www.scaladays.org/>
Scala <http://www.scaladays.org/>
Days <http://www.scaladays.org/>
June 16th-18th, <http://www.scaladays.org/>
Berlin <http://www.scaladays.org/>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Re: Many problems with akka-cluster-sharding-scala activator

Reply via email to