[
https://issues.apache.org/jira/browse/GEODE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336523#comment-16336523
]
Bruce Schuchardt commented on GEODE-4322:
-----------------------------------------
Hi [~vaharonyan],
One of the fixes for GEODE-2542 which pushes the cluster key to newly joining
nodes is not in 1.3. That one fixes a null pointer exception thrown when
receiving an encrypted message with id "-150" when starting/restarting locators
concurrently. This is something you are likely to hit. This fix is in the 1.4
release.
The fix for the issue you are hitting is in the 1.3 release. The symptom for
this issue is an NPE where the stack trace includes
GMSEncrypt.getPeerEncryptor(), as shown in this ticket's description.
Are you shutting down the whole cluster? If so you might consider deleting the
locatorView.dat files before starting the locators. You can't delete them if
you're doing a rolling restart or rolling upgrade but if the whole cluster is
down it is safe to delete them. The contain reboot information that lets the
locator rejoin the cluster. This information is causing confusion in the 1.2
algorithms that leads to the exception and deleting the files may clear up the
issue for you. These files are in the locator directories and have the
locator's port in the file name, such as locator10334view.dat.
> Locator fails to start with NPE during join to the distributed system
> ---------------------------------------------------------------------
>
> Key: GEODE-4322
> URL: https://issues.apache.org/jira/browse/GEODE-4322
> Project: Geode
> Issue Type: Bug
> Components: membership
> Affects Versions: 1.2.0
> Reporter: Vahram Aharonyan
> Assignee: Bruce Schuchardt
> Priority: Major
>
> Found out that after setting security-udp-dhalgo=AES:128 in prorperties files
> sometimes locator is failing to come online with the following Exception:
> [severe 2018/01/19 04:22:12.194 PST <locator request thread[20]> tid=0x45]
> Exception in processing request from 10.144.248.41
> java.lang.RuntimeException: Not found public key for member
> 16nodedata6(d4b4f5d4-47d2-44b1-a07c-6a7f5755e52d:11493)<ec><v2>:10002
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:177)
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.getPublicKey(JGroupsMessenger.java:1365)
> at
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.processRequest(GMSLocator.java:271)
> at
> org.apache.geode.distributed.internal.InternalLocator$PrimaryHandler.processRequest(InternalLocator.java:1256)
> at
> org.apache.geode.distributed.internal.tcpserver.TcpServer.lambda$processRequest$0(TcpServer.java:401)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258)
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175)
> ... 7 more
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPeerEncryptor(GMSEncrypt.java:258)
> at
> org.apache.geode.distributed.internal.membership.gms.messenger.GMSEncrypt.getPublicKey(GMSEncrypt.java:175)
> ... 7 more
> Please note, that generally this issue is hit after cluster restart. This is
> important, as during poweroff locator can go offline first and one of other
> members will become coordinator and update view file accordingly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)