[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263413#comment-16263413
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
-------------------------------------------

Github user karanmehta93 commented on the issue:

    https://github.com/apache/zookeeper/pull/307
  
    >  Can we at least make it so that the process of "push a metric to a 
buffer, have a thread that wakes up periodically and flushes information out of 
that buffer" is usable by multiple parts of the system, instead of coupling it 
to the one metric of request time?
    
    I am ready to put in the effort of making this framework more generic, if 
you feel its worth the time and effort to put into it. I would also like to 
hear others opinions on this. 
    The inspiration for this JIRA came from HBase, where it has been used to 
log slow requests at the basic level. I feel that it can be useful to keep 
track of latency spikes that are seen in production. @apurtell can suggest 
use-cases.


> ZooKeeper slow operation log
> ----------------------------
>
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to