[
https://issues.apache.org/jira/browse/IGNITE-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958749#comment-14958749
]
Mark Howard commented on IGNITE-1267:
-------------------------------------
We've also hit this problem. I think it's broader than the title suggests
though - it's for any node that is not in the original topology, not just new
nodes.
In our case, we're using ignite for relatively long jobs with a very small
fanout - most tasks map to a single job, on a cluster of perhaps 100 nodes. Due
to the topology restrictions in the collision and failover SPIs, these can
never be stolen either by a new or existing idle node.
The fix is relatively easy for us - comment out the topology checks in the job
stealing collision and failover SPIs. This is valid for us since our initial
load balancing is relatively straightforward, based on node attributes and the
same node attributes are used in the job stealing configuration. It may not be
entirely generic though since it's not as powerful as the original TopologySpi
which was in early versions of gridgain. Without it though the collision SPIs
are pretty much useless as they stand in the 1.4 release.. (unless we've missed
something!)
> JobStealingCollisionSpi never sends jobs to a node that joined after task was
> executed
> --------------------------------------------------------------------------------------
>
> Key: IGNITE-1267
> URL: https://issues.apache.org/jira/browse/IGNITE-1267
> Project: Ignite
> Issue Type: Bug
> Components: compute
> Affects Versions: 1.1.4
> Reporter: Valentin Kulichenko
> Labels: user-request
>
> Corresponding user thread (contains detailed description of the scenario that
> doesn't work):
> http://apache-ignite-users.70518.x6.nabble.com/Dynamic-ComputeTask-distribution-with-new-nodes-td997.html
> Essentially, {{JobStealingCollisionSpi}} always skips jobs that are not in
> task topology (see line 713). Task topology is static and created when task
> is executed, so newly joined node can't steal jobs. I think it should be able
> to do this if it satisfies initial cluster group predicate.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)