[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270809#comment-14270809 ]
Shixiong Zhu commented on SPARK-5124: ------------------------------------- {quote} 1. For DAGScheduler, we are probably OK to just use an event loop instead of an actor. Just put some messages into a queue, and have a loop that processes the queue. Otherwise we are making every call to DAGScheduler going through a socket and that can severely impact scheduler throughput. (Although I haven't looked closely at your change so maybe you are doing a different thing here) {quote} In my current implementation, DAGScheduler still uses Actor. A LocalActorRef should not pass through the socket. However, for better performance, we can use a Multi-Producer-Single-Consumer(MPSC) queue to bypass Akka. {quote} 2. Is it really that expensive to listen to network events that warrant a separate NetworkRpcEndpoint? {quote} I created a NetworkRpcEndpoint to avoid to do the following pattern matching checks for every message for RpcEndpoints not interested in them. {code} case AssociatedEvent(_, remoteAddress, _) => ... case DisassociatedEvent(_, remoteAddress, _) => ... case AssociationErrorEvent(cause, localAddress, remoteAddress, inbound, _) => ... {code} {quote} Thread-safe contract when processing messages (similar to Akka). {quote} The thread-safe contract means there is only one method of RpcEndpoint will be called at the same time, just like Actor. Without this property, RpcEndpoint will need a lock to protect its data. However, considering the complex logical of so many RpcEndpoints, it may lead to dead-lock. {quote} A simple fault tolerance(if a RpcEndpoint is crashed, restart it or stop it). {quote} It means "Any error thrown by `onStart`, `receive` and `onStop` will be sent to `onError`. If onError throws an error, it will force RpcEndpoint to restart by creating a new one." "restart" maybe not a proper way. But an `onError` which is used to handle all errors is better than requiring RPCEndPoint never have uncaught exceptions (need to write many try-catch codes) > Standardize internal RPC interface > ---------------------------------- > > Key: SPARK-5124 > URL: https://issues.apache.org/jira/browse/SPARK-5124 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Reynold Xin > Assignee: Shixiong Zhu > Attachments: Pluggable RPC - draft 1.pdf > > > In Spark we use Akka as the RPC layer. It would be great if we can > standardize the internal RPC interface to facilitate testing. This will also > provide the foundation to try other RPC implementations in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org