I think maybe a better option might be to use a lock for the cluster 
configuration. We can make the request to get the cluster config wait until the 
update to the cluster config is completely applied. Maybe we already have a 
lock to force cluster configuration updates to happen one at a time?

-Dan
________________________________
From: Mario Kevo <mario.k...@est.tech>
Sent: Tuesday, October 12, 2021 1:35 AM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Odg: Region is not created on one of the servers

The new ticket is opened.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9718&amp;data=04%7C01%7Cdasmith%40vmware.com%7C712ca6480e4642cb8dd108d98d5b3ac0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637696245240683976%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Q4suASyMhog5N8ZWdBH266udGFxe9MBQVMKEkms1zhM%3D&amp;reserved=0

There are two proposals on the ticket, so it should be decided in which way we 
should go.

BR,
Mario
________________________________
Šalje: Udo Kohlmeyer <u...@vmware.com>
Poslano: 12. listopada 2021. 0:59
Prima: dev@geode.apache.org <dev@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

Hi Mario,

I think your assessment of the problem is correct. Thinking about it, there is 
no simple (correct) way to easily solve this. Given that there are too many 
variables in play, users making configurational changes, whilst servers are 
coming up.

Now, that said, I think we should address this problem. I also think your 
assessment is correct that cluster configuration was not written to handle this 
scenario. I think some thought has to go into the algorithm that one would like 
to follow and how we would like to resolve it.

Can you please raise a ticket on this issue.

--Udo

From: Mario Kevo <mario.k...@est.tech>
Date: Monday, October 11, 2021 at 11:27 PM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Odg: Region is not created on one of the servers
I think that there can be a problem if we change to first add it to cluster 
config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the 
view and sends all of them to start creating a region with parameters specified 
in the command.
The region creating is started on all servers and after it is finished, it is 
added to the cluster configuration. In case there are some problems with 
creating a region(wrong parameter used or something else) it will not create a 
region on the existing servers and will not write anything in a cluster 
configuration.

In case we decide to change order, it will write in the cluster config before 
the command is successful, and then we should have some backup to rollback 
cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with 
starting servers.

BR,
Mario
________________________________
Šalje: Dan Smith <dasm...@vmware.com>
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org <dev@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region 
didn't get created on 1 server. At first glance, it looks like the problem is 
that we distribute the region to all the servers before adding it to cluster 
config? Seems like we need to do distribution after​ adding the region to 
cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <mario.k...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information 
to create a region on itself, but the problem is in the server which is started 
after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after 
that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get 
information to create a region on itself from the locator. Also, the cluster 
configuration doesn't have that information yet, so the server cannot read it 
from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario

Reply via email to