[ 
https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270809#comment-14270809
 ] 

Shixiong Zhu commented on SPARK-5124:
-------------------------------------

{quote}
1. For DAGScheduler, we are probably OK to just use an event loop instead of an 
actor. Just put some messages into a queue, and have a loop that processes the 
queue. Otherwise we are making every call to DAGScheduler going through a 
socket and that can severely impact scheduler throughput. (Although I haven't 
looked closely at your change so maybe you are doing a different thing here)
{quote}

In my current implementation, DAGScheduler still uses Actor. A LocalActorRef 
should not pass through the socket. However, for better performance, we can use 
a Multi-Producer-Single-Consumer(MPSC) queue to bypass Akka.

{quote}
2. Is it really that expensive to listen to network events that warrant a 
separate NetworkRpcEndpoint?
{quote}

I created a NetworkRpcEndpoint to avoid to do the following pattern matching 
checks for every message for RpcEndpoints not interested in them.

{code}
case AssociatedEvent(_, remoteAddress, _) =>
  ...
case DisassociatedEvent(_, remoteAddress, _) =>
  ...
case AssociationErrorEvent(cause, localAddress, remoteAddress, inbound, _) =>
  ...
{code}

{quote}
Thread-safe contract when processing messages (similar to Akka).
{quote}

The thread-safe contract means there is only one method of RpcEndpoint will be 
called at the same time, just like Actor. Without this property, RpcEndpoint 
will need a lock to protect its data. However, considering the complex logical 
of so many RpcEndpoints, it may lead to dead-lock.

{quote}
A simple fault tolerance(if a RpcEndpoint is crashed, restart it or stop it).
{quote}

It means "Any error thrown by `onStart`, `receive` and `onStop` will be sent to 
`onError`. If onError throws an error, it will force RpcEndpoint to
 restart by creating a new one."

"restart" maybe not a proper way. But an `onError` which is used to handle all 
errors is better than requiring RPCEndPoint never have uncaught exceptions 
(need to write many try-catch codes)

> Standardize internal RPC interface
> ----------------------------------
>
>                 Key: SPARK-5124
>                 URL: https://issues.apache.org/jira/browse/SPARK-5124
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Reynold Xin
>            Assignee: Shixiong Zhu
>         Attachments: Pluggable RPC - draft 1.pdf
>
>
> In Spark we use Akka as the RPC layer. It would be great if we can 
> standardize the internal RPC interface to facilitate testing. This will also 
> provide the foundation to try other RPC implementations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to