[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system

Bruce Schuchardt (JIRA) Tue, 23 Jan 2018 15:19:15 -0800

    [ 
https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336523#comment-16336523
 ]


Bruce Schuchardt commented on GEODE-4322:
-----------------------------------------

Hi [~vaharonyan],

One of the fixes for GEODE-2542 which pushes the cluster key to newly joining 
nodes is not in 1.3.  That one fixes a null pointer exception thrown when 
receiving an encrypted message with id "-150" when starting/restarting locators 
concurrently.  This is something you are likely to hit.  This fix is in the 1.4 
release.

The fix for the issue you are hitting is in the 1.3 release.  The symptom for 
this issue is an NPE where the stack trace includes 
GMSEncrypt.getPeerEncryptor(), as shown in this ticket's description.

Are you shutting down the whole cluster?  If so you might consider deleting the 
locatorView.dat files before starting the locators.  You can't delete them if 
you're doing a rolling restart or rolling upgrade but if the whole cluster is 
down it is safe to delete them.  The contain reboot information that lets the 
locator rejoin the cluster.  This information is causing confusion in the 1.2 
algorithms that leads to the exception and deleting the files may clear up the 
issue for you.  These files are in the locator directories and have the 
locator's port in the file name, such as locator10334view.dat.

 

> Locator fails to start with NPE during join to the distributed system
> ---------------------------------------------------------------------
>
>                 Key: GEODE-4322
>                 URL: https://issues.apache.org/jira/browse/GEODE-4322
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.2.0
>            Reporter: Vahram Aharonyan
>            Assignee: Bruce Schuchardt
>            Priority: Major
>
> Found out that after setting security-udp-dhalgo=AES:128 in prorperties files 
> sometimes  locator is failing to come online with the following Exception:
> [severe 2018/01/19 04:22:12.194 PST <locator request thread[20]> tid=0x45] 
> Exception in processing request from 10.144.248.41
> java.lang.RuntimeException: Not found public key for member 
> 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493)<ec><v2>:10002
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177)
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365)
>  at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271)
>  at 
> org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256)
>  at 
> org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258)
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175)
>  ... 7 more
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258)
>  at 
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175)
>  ... 7 more
> Please note, that generally this issue is hit after cluster restart. This is 
> important, as during poweroff locator can go offline first and one of other 
> members will become coordinator and update view file accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (GEODE-4322) Locator fails to start with NPE during join to the distributed system

Reply via email to