[jira] [Comment Edited] (IGNITE-12297) Detect lost partitions is not happened during cluster activation

Matija Polajnar (Jira) Fri, 29 May 2020 05:05:50 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975026#comment-16975026
 ]


Matija Polajnar edited comment on IGNITE-12297 at 5/29/20, 12:04 PM:
---------------------------------------------------------------------

For the record, as discussed in IGNITE-10226, this resulted in us getting a 
*org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Cannot run 
update query. Node must own all the necessary partitions.* This was happening 
on a one-node "cluster".

A sane but sometimes difficult to execute workaround was provided by [~jokser]:
{quote}1) Start another node, this is a topology event that will trigger 
detecting lost partitions.
2) Stop started node
3) If you have partition loss policy != IGNORE trigger explicitly 
`resetLostPartitions`
It should help to return back partition to OWNING state.
{quote}
It works, but you need to configure another node for the cluster. A dangerous 
and ugly but more practical workaround is to have this reflection-based method 
ready to invoke when you need it:

 
{code:java}
public void resetMovingPartitions() {
    try {
        Field igniteKernalField = IgniteSpringBean.class.getDeclaredField("g");
        igniteKernalField.setAccessible(true);
        IgniteKernal igniteKernal = (IgniteKernal)igniteKernalField.get(this);
        GridKernalContextImpl kernalContext = 
(GridKernalContextImpl)igniteKernal.context();
        kernalContext.cache().context().exchange().scheduleResendPartitions();
    } catch (IllegalAccessException | NoSuchFieldException | ClassCastException 
e) {
        throw new AssertionError(e);
    }
}
{code}
It works for us.

 


was (Author: matijap):
For the record, as discussed in IGNITE-10266, this resulted in us getting a 
*org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Cannot run 
update query. Node must own all the necessary partitions.* This was happening 
on a one-node "cluster".

A sane but sometimes difficult to execute workaround was provided by [~jokser]:
{quote}1) Start another node, this is a topology event that will trigger 
detecting lost partitions.
2) Stop started node
3) If you have partition loss policy != IGNORE trigger explicitly 
`resetLostPartitions`
It should help to return back partition to OWNING state.
{quote}
It works, but you need to configure another node for the cluster. A dangerous 
and ugly but more practical workaround is to have this reflection-based method 
ready to invoke when you need it:

 
{code:java}
public void resetMovingPartitions() {
    try {
        Field igniteKernalField = IgniteSpringBean.class.getDeclaredField("g");
        igniteKernalField.setAccessible(true);
        IgniteKernal igniteKernal = (IgniteKernal)igniteKernalField.get(this);
        GridKernalContextImpl kernalContext = 
(GridKernalContextImpl)igniteKernal.context();
        kernalContext.cache().context().exchange().scheduleResendPartitions();
    } catch (IllegalAccessException | NoSuchFieldException | ClassCastException 
e) {
        throw new AssertionError(e);
    }
}
{code}
It works for us.

 

> Detect lost partitions is not happened during cluster activation
> ----------------------------------------------------------------
>
>                 Key: IGNITE-12297
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12297
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Priority: Major
>              Labels: newbie
>
> We invoke `detectLostPartitions` during PME only if there is a server join or 
> server left.
> However,  we can activate a persistent cluster where a partition may have 
> MOVING status on all nodes. In this case, a partition may stay in MOVING 
> state forever before any other topology event. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (IGNITE-12297) Detect lost partitions is not happened during cluster activation

Reply via email to