[
https://issues.apache.org/jira/browse/IGNITE-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Kalashnikov updated IGNITE-12653:
---------------------------------------
Labels: newbie (was: )
> Add example of baseline auto-adjust feature
> -------------------------------------------
>
> Key: IGNITE-12653
> URL: https://issues.apache.org/jira/browse/IGNITE-12653
> Project: Ignite
> Issue Type: Task
> Components: examples
> Reporter: Anton Kalashnikov
> Priority: Major
> Labels: newbie
>
> Work on the Phase II of IEP-4 (Baseline topology) [1] has finished. It makes
> sense to implement some examples of "Baseline auto-adjust" [2].
> "Baseline auto-adjust" feature implements mechanism of auto-adjust baseline
> corresponding to current topology after event join/left was appeared. It is
> required because when a node left the grid and nobody would change baseline
> manually it can lead to lost data(when some more nodes left the grid on
> depends in backup factor) but permanent tracking of grid is not always
> possible/desirible. Looks like in many cases auto-adjust baseline after some
> timeout is very helpfull.
> Distributed metastore[3](it is already done):
> First of all it is required the ability to store configuration data
> consistently and cluster-wide. Ignite doesn't have any specific API for such
> configurations and we don't want to have many similar implementations of the
> same feature in our code. After some thoughts is was proposed to implement it
> as some kind of distributed metastorage that gives the ability to store any
> data in it.
> First implementation is based on existing local metastorage API for
> persistent clusters (in-memory clusters will store data in memory).
> Write/remove operation use Discovery SPI to send updates to the cluster, it
> guarantees updates order and the fact that all existing (alive) nodes have
> handled the update message. As a way to find out which node has the latest
> data there is a "version" value of distributed metastorage, which is
> basically <number of all updates, hash of updates>. All updates history until
> some point in the past is stored along with the data, so when an outdated
> node connects to the cluster it will receive all the missing data and apply
> it locally. If there's not enough history stored or joining node is clear
> then it'll receive shapshot of distributed metastorage so there won't be
> inconsistencies.
> Baseline auto-adjust:
> Main scenario:
> - There is a grid with the baseline is equal to the current topology
> - New node joins to grid or some node left(failed) the grid
> - New mechanism detects this event and it add a task for changing
> baseline to queue with configured timeout
> - If a new event happens before baseline would be changed task would
> be removed from the queue and a new task will be added
> - When a timeout is expired the task would try to set new baseline
> corresponded to current topology
> First of all we need to add two parameters[4]:
> - baselineAutoAdjustEnabled - enable/disable "Baseline auto-adjust"
> feature.
> - baselineAutoAdjustTimeout - timeout after which baseline should be
> changed.
> These parameters are cluster-wide and can be changed in real-time because it
> is based on "Distributed metastore".
> Restrictions:
> - This mechanism handling events only on active grid
> - for in-memory nodes - enabled by default. For persistent nodes -
> disabled.
> - If lost partitions was detected this feature would be disabled
> - If baseline was adjusted manually on baselineNodes != gridNodes the
> exception would be thrown
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches
> [2] https://issues.apache.org/jira/browse/IGNITE-8571
> [3] https://issues.apache.org/jira/browse/IGNITE-10640
> [4] https://issues.apache.org/jira/browse/IGNITE-8573
--
This message was sent by Atlassian Jira
(v8.3.4#803005)