[
https://issues.apache.org/jira/browse/SYSTEMML-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495614#comment-16495614
]
LI Guobao commented on SYSTEMML-2349:
-------------------------------------
[~mboehm7], I'd like to know if we need to handle the error thrown by the agg
service? If so, I have no idea how to catch the error outside the thread.
Because if the agg service is down, all the workers will be blocked in the pull
method and could not be stopped. And also the agg service will not stop if the
workers have not finished their work. Thus, we could not reach to join the
thread of agg service but be blocked in joining the workers.
> Local worker error handling
> ---------------------------
>
> Key: SYSTEMML-2349
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2349
> Project: SystemML
> Issue Type: Sub-task
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
>
> While playing around with the locking scheme of the parameter server, I
> encountered unrelated errors that led to the parameter server hanging. We
> need to make sure all worker errors are correctly propagated so that we can
> guarantee termination.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)