[jira] [Updated] (IGNITE-1837) Rebalancing on a big cluster (64 nodes and more)

Denis Magda (JIRA) Mon, 02 Nov 2015 04:59:04 -0800

     [ 
https://issues.apache.org/jira/browse/IGNITE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Magda updated IGNITE-1837:
--------------------------------
    Description: 
It seems that Ignite has different rebalancing related issues that appear when 
a big cluster is started.

Under the big cluster I mean:
- cluster of 64 server nodes;
- cluster of 64 server and 64 client nodes.

The issues can be divided on three main use cases.

1) Slow rebalancing on start.

- If to set partitions number for some cache to value bigger than default one 
(to 3200 or to 6400, etc.) then rebalancing of such caches may take several 
minutes. The caches are empty at that time. In addition, as a part of this 
issue let's document that the number of partitions can't exceed some value.

- exchange message on NODE_JOINED event that times out for a long time. 
Discussed there: 
http://apache-ignite-users.70518.x6.nabble.com/Help-with-tuning-for-larger-clusters-td1692.html#a1813

2) Slow rebalancing on client nodes shutdown.

If to stop a significant number of client nodes at the same time then again by 
some reason the rebalancing will take serveral minutes.

  was:
It seems that Ignite has different rebalancing related issues that appear when 
a big cluster is started.

Under the big cluster I mean:
- cluster of 64 server nodes;
- cluster of 64 server and 64 client nodes.

The issues can be divided on three main use cases.

1) Slow rebalancing on start.

If to set partitions number for some cache to value bigger than default one (to 
3200 or to 6400, etc.) then rebalancing of such caches may take several 
minutes. The caches are empty at that time.

In addition, as a part of this issue let's document that the number of 
partitions can't exceed some value.


2) Slow rebalancing on client nodes shutdown.

If to stop a significant number of client nodes at the same time then again by 
some reason the rebalancing will take serveral minutes.

3) Periodic rebalancing timeouts during the time a cluster is idle.

Discussed there: 
http://apache-ignite-users.70518.x6.nabble.com/Help-with-tuning-for-larger-clusters-td1692.html#a1813

Probably the code is not optimal and sends rebalancing related messages too 
frequently.


> Rebalancing on a big cluster (64 nodes and more)
> ------------------------------------------------
>
>                 Key: IGNITE-1837
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1837
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: ignite-1.4
>            Reporter: Denis Magda
>            Assignee: Alexey Goncharuk
>             Fix For: 1.5
>
>
> It seems that Ignite has different rebalancing related issues that appear 
> when a big cluster is started.
> Under the big cluster I mean:
> - cluster of 64 server nodes;
> - cluster of 64 server and 64 client nodes.
> The issues can be divided on three main use cases.
> 1) Slow rebalancing on start.
> - If to set partitions number for some cache to value bigger than default one 
> (to 3200 or to 6400, etc.) then rebalancing of such caches may take several 
> minutes. The caches are empty at that time. In addition, as a part of this 
> issue let's document that the number of partitions can't exceed some value.
> - exchange message on NODE_JOINED event that times out for a long time. 
> Discussed there: 
> http://apache-ignite-users.70518.x6.nabble.com/Help-with-tuning-for-larger-clusters-td1692.html#a1813
> 2) Slow rebalancing on client nodes shutdown.
> If to stop a significant number of client nodes at the same time then again 
> by some reason the rebalancing will take serveral minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (IGNITE-1837) Rebalancing on a big cluster (64 nodes and more)

Reply via email to