[
https://issues.apache.org/jira/browse/GEODE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219786#comment-16219786
]
Barry Oglesby commented on GEODE-3709:
--------------------------------------
The client logs correspond with what you're saying.
In the 18:03 UTC minute (which is the minute the issue started), the
dca-prd-rstc12 client received 42458 invalidate messages from the server
between 18:03:00.000 and 18:03:38.799, all of which were processed immediately.
Thats an average of 1061 invalidates / second.
For example, the 18:03:00 second received and processed 2106 invalidate
messages.
Processing log messages look like:
{noformat}
[fine 2017/10/20 18:03:00.000 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (141 bytes)
[fine 2017/10/20 18:03:00.000 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
19991|5057|offerCode|2_W_0000787828_20171018_5 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:00.000 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
19991|5057|offerCode|2_W_0000787828_20171018_5 callbackArgument: null
{noformat}
>From 18:03:39 on, the client received ~2 invalidate messages a second (roughly
>a message every 400 ms).
So, the client goes from receiving ~1000 messages / sec to ~2 messages / second.
Here are the log messages from 18:03:39 UTC to 18:03:45 UTC:
{noformat}
[fine 2017/10/20 18:03:39.396 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:39.396 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|b0ac2670-3fd5-408e-8da3-43a399215645 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:39.396 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|b0ac2670-3fd5-408e-8da3-43a399215645 callbackArgument: null
[fine 2017/10/20 18:03:39.804 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:39.804 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|20836a0a-7435-4e97-a4be-cdef4dc31213 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:39.804 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|20836a0a-7435-4e97-a4be-cdef4dc31213 callbackArgument: null
[fine 2017/10/20 18:03:40.416 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (141 bytes)
[fine 2017/10/20 18:03:40.416 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000296797_20171018_0 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:40.416 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000296797_20171018_0 callbackArgument: null
[fine 2017/10/20 18:03:40.824 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:40.824 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|f371045a-f0a2-4fcd-92f5-10b1088e70fc callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:40.824 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|f371045a-f0a2-4fcd-92f5-10b1088e70fc callbackArgument: null
[fine 2017/10/20 18:03:41.232 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (141 bytes)
[fine 2017/10/20 18:03:41.232 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000576827_20171018_0 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:41.232 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000576827_20171018_0 callbackArgument: null
[fine 2017/10/20 18:03:41.844 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:41.844 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|45e86530-cb06-4d8b-a501-206a2be47395 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:41.844 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|45e86530-cb06-4d8b-a501-206a2be47395 callbackArgument: null
[fine 2017/10/20 18:03:42.252 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:42.252 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|d5424b01-7115-41cb-89e9-71f3b949c6d9 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:42.252 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|d5424b01-7115-41cb-89e9-71f3b949c6d9 callbackArgument: null
[fine 2017/10/20 18:03:42.660 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (141 bytes)
[fine 2017/10/20 18:03:42.660 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000935202_20171018_0 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:42.660 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000935202_20171018_0 callbackArgument: null
[fine 2017/10/20 18:03:43.272 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:43.272 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|7d91518b-bd61-4c55-81d1-a0b8fc293062 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:43.272 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|7d91518b-bd61-4c55-81d1-a0b8fc293062 callbackArgument: null
[fine 2017/10/20 18:03:43.680 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (141 bytes)
[fine 2017/10/20 18:03:43.680 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000446976_20171018_0 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:43.680 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
19991|5085|offerCode|2_W_0000446976_20171018_0 callbackArgument: null
[fine 2017/10/20 18:03:44.292 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:44.292 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|7c825d65-7af7-4952-9f52-ba2a28b5b852 callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:44.292 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|7c825d65-7af7-4952-9f52-ba2a28b5b852 callbackArgument: null
[fine 2017/10/20 18:03:44.700 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Received
invalidate message of length (142 bytes)
[fine 2017/10/20 18:03:44.700 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidating
entry for region: /CircularStoreMap key:
20001|1620|8f87d525-2a08-4c68-8c0f-80137d39f2ad callbackArgument: null
withInterest=true withCQs=false version=null
[fine 2017/10/20 18:03:44.700 UTC <Cache Client Updater Thread on
10.55.13.77(DCA-PRD-GDCS11:9074)<v94>:1024 port 40404> tid=0xa9] Invalidated
entry for region: /CircularStoreMap key:
20001|1620|8f87d525-2a08-4c68-8c0f-80137d39f2ad callbackArgument: null
{noformat}
The 'Received invalidate message' message is logged right after the Cache
Client Updater Thread reads a message from the socket and determines the
message type.
The way that server -> client messaging works is:
On the server:
- the Client Message Dispatcher peeks a message from queue
- the Client Message Dispatcher writes the message to the socket
- repeat
On the client:
- the Cache Client Updater reads a message from the socket
- the Cache Client Updater processes that message
- repeat
There isn't an explicit ACK from the client to the server for each message, and
the server doesn't wait for an ACK from the client.
The acknowledgements from the client to the server are asynchronous every 100
ms by default. The client sends the last message it has seen which causes the
server to remove those messages from the queue.
The ACKs in the TCP dump are at the TCP level not the Geode level.
Based on the thread dumps, I don't see that Geode is doing anything other than
trying to write to the socket. If the Geode server were doing something for
400ms between message sends, the thread dumps would show it, but they don't.
All they show is the {{Client Message Dispatcher}} attempting to write to the
socket.
Unfortunately, there isn't a whole lot of logging on the server in this area,
so I'm not sure server fine logs will tell us much.
> Geode Version: 1.1.1 In one of the project we a...
> -----------------------------------------------------
>
> Key: GEODE-3709
> URL: https://issues.apache.org/jira/browse/GEODE-3709
> Project: Geode
> Issue Type: Improvement
> Components: client queues
> Reporter: Gregory Chase
> Attachments: 20171006-logs-stats-tds.zip, 20171020.zip,
> CacheClientProxyStats_sentBytes.gif,
> DistributionStats_receivedBytes_CacheClientProxyStats_sentBytes.gif,
> gf-rest-stats-12-05.gfs, myStatisticsArchiveFile-04-01.gfs
>
>
> Geode Version: 1.1.1
> In one of the project we are using Geode. Here is a summary of how we use it.
> - Geode servers have multiple regions.
> - Clients subscribe to the data from these regions.
> - Clients subscribe interest in all the entries, therefore they get updates
> about all the entries from creation to modification to deletion.
> - One of the regions usually has 5-10 million entries with a TTL of 24 hours.
> Most entries are added in an hour's span one after other. So when TTL kicks
> in, they are often destroyed in an hour.
> Problem:
> Every now and then we observe following message:
> Client queue for
> _gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue
> client is full.
> This seems to happen when the TTL kicks in on the region with 5-10 million
> entries. Entries start getting evicted (deleted); the updates (destroys) now
> must be sent to clients. We see that the updates do happen for a while but
> suddenly the updates stop and the queue size starts growing. This is becoming
> a major issue for smooth functioning of our production setup. Any help will
> be much appreciated.
> I did some ground work by downloading and looking at the code. I see
> reference to 2 issues #37581, #51400. But I am unable to view actual JIRA
> tickets (needs login credentials) Hopefully, it helps someone looking at the
> issue.
> Here is the pertinent code:
> @Override
> @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
> void checkQueueSizeConstraint() throws InterruptedException {
> if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix
> for bug 39413
> if (Thread.interrupted())
> throw new InterruptedException();
> synchronized (this.putGuard) {
> if (putPermits <= 0) {
> synchronized (this.permitMon) {
> if (reconcilePutPermits() <= 0) {
> if
> (region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
> isClientSlowReciever = true;
> } else {
> try {
> long logFrequency =
> CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
> CacheClientNotifier ccn =
> CacheClientNotifier.getInstance();
> if (ccn != null) { // check needed for junit tests
> logFrequency = ccn.getLogFrequency();
> }
> if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
> logger.warn(LocalizedMessage.create(
>
> LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
> new Object[] {region.getName()}));
> this.maxQueueSizeHitCount = 0;
> }
> ++this.maxQueueSizeHitCount;
> this.region.checkReadiness(); // fix for bug 37581
> // TODO: wait called while holding two locks
>
> this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
> this.region.checkReadiness(); // fix for bug 37581
> // Fix for #51400. Allow the queue to grow beyond its
> // capacity/maxQueueSize, if it is taking a long time to
> // drain the queue, either due to a slower client or the
> // deadlock scenario mentioned in the ticket.
> reconcilePutPermits();
> if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
> logger.info(LocalizedMessage
>
> .create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
> }
> } catch (InterruptedException ex) {
> // TODO: The line below is meaningless. Comment it out
> later
> this.permitMon.notifyAll();
> throw ex;
> }
> }
> }
> } // synchronized (this.permitMon)
> } // if (putPermits <= 0)
> --putPermits;
> } // synchronized (this.putGuard)
> }
> }
> *Reporter*: Mangesh Deshmukh
> *E-mail*: [mailto:[email protected]]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)