[
https://issues.apache.org/jira/browse/GEODE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219856#comment-16219856
]
Mangesh Deshmukh commented on GEODE-3709:
-----------------------------------------
I concur with you that there isn't sufficient information(from logs and
tcpdump) to tell us where the problem really lies. I had also noticed the
window size degradation and that seems to be a common theme in this issue. (Saw
it across multiple runs).
To troubleshoot further, we could however do a custom build with some log
statements. Please let me know what would be appropriate place to put some log
statements.
Is this the right place?
{code:java}
Message.java
void flushBuffer() throws IOException {
final ByteBuffer cb = getCommBuffer();
if (this.socketChannel != null) {
cb.flip();
do {
*{color:#d04437}this.socketChannel.write(cb);{color}*
} while (cb.remaining() > 0);
} else {
this.outputStream.write(cb.array(), 0, cb.position());
}
if (this.messageStats != null) {
this.messageStats.incSentBytes(cb.position());
}
cb.clear();
}
{code}
This should give us exact time when the message is being written from
application point of view.
> Geode Version: 1.1.1 In one of the project we a...
> -----------------------------------------------------
>
> Key: GEODE-3709
> URL: https://issues.apache.org/jira/browse/GEODE-3709
> Project: Geode
> Issue Type: Improvement
> Components: client queues
> Reporter: Gregory Chase
> Attachments: 20171006-logs-stats-tds.zip, 20171020.zip,
> CacheClientProxyStats_sentBytes.gif,
> DistributionStats_receivedBytes_CacheClientProxyStats_sentBytes.gif,
> gf-rest-stats-12-05.gfs, myStatisticsArchiveFile-04-01.gfs
>
>
> Geode Version: 1.1.1
> In one of the project we are using Geode. Here is a summary of how we use it.
> - Geode servers have multiple regions.
> - Clients subscribe to the data from these regions.
> - Clients subscribe interest in all the entries, therefore they get updates
> about all the entries from creation to modification to deletion.
> - One of the regions usually has 5-10 million entries with a TTL of 24 hours.
> Most entries are added in an hour's span one after other. So when TTL kicks
> in, they are often destroyed in an hour.
> Problem:
> Every now and then we observe following message:
> Client queue for
> _gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue
> client is full.
> This seems to happen when the TTL kicks in on the region with 5-10 million
> entries. Entries start getting evicted (deleted); the updates (destroys) now
> must be sent to clients. We see that the updates do happen for a while but
> suddenly the updates stop and the queue size starts growing. This is becoming
> a major issue for smooth functioning of our production setup. Any help will
> be much appreciated.
> I did some ground work by downloading and looking at the code. I see
> reference to 2 issues #37581, #51400. But I am unable to view actual JIRA
> tickets (needs login credentials) Hopefully, it helps someone looking at the
> issue.
> Here is the pertinent code:
> @Override
> @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT")
> void checkQueueSizeConstraint() throws InterruptedException {
> if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix
> for bug 39413
> if (Thread.interrupted())
> throw new InterruptedException();
> synchronized (this.putGuard) {
> if (putPermits <= 0) {
> synchronized (this.permitMon) {
> if (reconcilePutPermits() <= 0) {
> if
> (region.getSystem().getConfig().getRemoveUnresponsiveClient()) {
> isClientSlowReciever = true;
> } else {
> try {
> long logFrequency =
> CacheClientNotifier.DEFAULT_LOG_FREQUENCY;
> CacheClientNotifier ccn =
> CacheClientNotifier.getInstance();
> if (ccn != null) { // check needed for junit tests
> logFrequency = ccn.getLogFrequency();
> }
> if ((this.maxQueueSizeHitCount % logFrequency) == 0) {
> logger.warn(LocalizedMessage.create(
>
> LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL,
> new Object[] {region.getName()}));
> this.maxQueueSizeHitCount = 0;
> }
> ++this.maxQueueSizeHitCount;
> this.region.checkReadiness(); // fix for bug 37581
> // TODO: wait called while holding two locks
>
> this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime);
> this.region.checkReadiness(); // fix for bug 37581
> // Fix for #51400. Allow the queue to grow beyond its
> // capacity/maxQueueSize, if it is taking a long time to
> // drain the queue, either due to a slower client or the
> // deadlock scenario mentioned in the ticket.
> reconcilePutPermits();
> if ((this.maxQueueSizeHitCount % logFrequency) == 1) {
> logger.info(LocalizedMessage
>
> .create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS));
> }
> } catch (InterruptedException ex) {
> // TODO: The line below is meaningless. Comment it out
> later
> this.permitMon.notifyAll();
> throw ex;
> }
> }
> }
> } // synchronized (this.permitMon)
> } // if (putPermits <= 0)
> --putPermits;
> } // synchronized (this.putGuard)
> }
> }
> *Reporter*: Mangesh Deshmukh
> *E-mail*: [mailto:[email protected]]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)