Hi,
I encountered some surprising behavior while stress testing my application,
and boiled it down to the following code:
import org.junit.Test
import akka.actor._
import akka.actor.Identify
import java.util.concurrent.atomic.AtomicInteger
import com.typesafe.config.{Config, ConfigFactory}
import java.io.StringReader
class ThousandActorsWatchingEachOther {
val countUp = new AtomicInteger(0)
@Test
def testThousand() {
val actorSystem1: ActorSystem = ActorSystem.create("system",
getLocalHostRemotingConfig(2552))
val count = 1000
for (i <- 0 until count) {
actorSystem1.actorOf(Props(new Echoer), s"$i")
}
Thread.sleep(5000l) // give them time to be started
val actorSystem2: ActorSystem = ActorSystem.create("system",
getLocalHostRemotingConfig(2553))
for (i <- 0 until count) {
actorSystem2.actorOf(Props(new Basher(i)), s"basher-$i")
}
Thread.sleep(360000l)
}
class Echoer extends Actor {
override def receive = {
case x => sender.tell(x, self)
}
}
class Basher(target: Int) extends Actor {
override def preStart() {
context.actorSelection(s"akka.tcp://system@localhost:2552/user/$target") !
Identify()
}
override def receive = {
case ActorIdentity(_, None) =>
println("Actor not found")
case ActorIdentity(_, Some(x)) =>
context.watch(x)
println(countUp.incrementAndGet())
case Terminated(x) =>
println("Terminated!")
}
}
def getLocalHostRemotingConfig(port: Int): Config = {
ConfigFactory.parseReader(new StringReader(
"""
|akka {
| actor {
| provider = "akka.remote.RemoteActorRefProvider"
| }
| remote {
| enabled-transports = ["akka.remote.netty.tcp"]
| netty.tcp {
| hostname = "localhost"
| port = """.stripMargin + port + """
| }
| }
|}
""".stripMargin))
}
}
Also available here:
https://gist.github.com/sciolizer/8702641/2bc3bb4589b1aad66df51028a638c0bc975f53b5
The test starts up two separate actor systems supporting tcp remoting and
listening on different ports. The system on port 2552 spawns a thousand
echoer actors, which just echo back any messages they receive, although
this is not important for the test. The system on port 2553 spawns a
thousand actors which 1) locate their corresponding echoer using
actorSelection and 2) deathwatch the echoer actor once they've found it.
The expected behavior is that the numbers 1 to 1000 are printed out
(corresponding to the number of deathwatches created) and then the test
sleeps for an hour.
What actually happens is that somewhere between the 500th and 600th death
watch, I get the following error:
[WARN] [01/29/2014 20:18:16.934] [system-akka.actor.default-dispatcher-16]
[Remoting] Association to [akka.tcp://system@localhost:2552] having UID
[1419818278] is irrecoverably failed. UID is now quarantined and all
messages to this UID will be delivered to dead letters. Remote actor system
must be restarted to recover from this situation.
I've found three solutions to make this error go away:
1) Remove the context.watch(x) line
2) Set akka.remote.system-message-buffer-size to a large value, such as
10000
3) Ramp up the traffic slowly instead of pounding the system immediately.
It seems solutions (1) and (2) are roughly the same, since death watches
are managed using system messages. I don't understand why the 3rd solution
works, though.
My implementation of the 3rd solution can be found here:
https://gist.github.com/sciolizer/8702641/revisions (revision 2)
I warmed the system up by gradually decreasing the sleep time between actor
creations from 10 milliseconds to 0 milliseconds, for a thousand actor
creations, and then let it run at full speed for another thousand actors
(2000 total). So for the second half of its test, the 3rd solution is no
different from the original problem.
So my two questions are:
Why does solution (3) work? It seems to me that the system-message-buffer
should still overflow after the 1500th actor or so is created.
and
Is my stress test fair? akka.remote.system-message-buffer-size's default
value of 1000 seems very low. What are the tradeoffs in increasing it?
Thanks,
Josh "Ua" Ball
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.