[ 
https://issues.apache.org/jira/browse/GEODE-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261084#comment-17261084
 ] 

Jack commented on GEODE-8248:
-----------------------------

Is there a shutdown-all gfsh command? Based on documentation, "If you are using 
persistent regions, (members are persisting data to disk), you should use the 
gfsh shutdown command to stop the running system in an orderly fashion. This 
command synchronizes persistent partitioned regions before shutting down, which 
makes the next startup of the distributed system as efficient as possible."

> Member hangs waiting for missing disk-stores after gfsh shutdown
> ----------------------------------------------------------------
>
>                 Key: GEODE-8248
>                 URL: https://issues.apache.org/jira/browse/GEODE-8248
>             Project: Geode
>          Issue Type: Bug
>          Components: gfsh, persistence
>            Reporter: Juan Ramos
>            Priority: Major
>         Attachments: temporal.zip
>
>
> Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and 
> I stop both using the {{gfsh shutdown}} command.
> According to the 
> [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html],
>  I should be able to start either of the servers without any problems as both 
> host the most up to date data. However, what happens in reality is that the 
> startup hangs with the following:
> {noformat}
> (1) Executing - start server --name=server1 --locators=localhost[10334] 
> --server-port=40401 --cache-xml-file=/temporal/cache.xml
> .........
> Region /TestRegion has potentially stale data. It is waiting for another 
> member to recover the latest data.
> My persistent id:
>   DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
>   Name: server1
>   Location: /temporal/server1/dataStore
> Members with potentially new data:
> [
>   DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
>   Name: server2
>   Location: /temporal/server2/dataStore
> ]
> "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in 
> Object.wait() [0x000070000ab04000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
>       - locked <0x0000000719df55e0> (a 
> org.apache.geode.internal.cache.persistence.MembershipChangeListener)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
>       at 
> org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
>       at 
> org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
>       at 
> org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
>       at 
> org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
>       - locked <0x00000005c0593168> (a 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
>       at 
> org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
>       at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
>       - locked <0x00000005c016a108> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       - locked <0x00000005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>       at 
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
>       - locked <0x00000005c0043de0> (a java.lang.Class for 
> org.apache.geode.internal.cache.InternalCacheBuilder)
>       at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
>       at 
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
>       at 
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
>       at 
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
>       at 
> org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
>       at 
> org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
> {noformat}
> We should either fix the problem and make sure the members fully synchronise 
> their data during the {{shutdown}} process so they don't have to wait on each 
> other or, if this is the expected behaviour, update the documentation 
> accordingly.
> The attached {{zip}} file contains a simple script to reproduce the issue, 
> the only thing that needs to be changed after downloading and uncompressing 
> the file, it's the {{GEMFIRE}} environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to