Hi, This looks cool :) I have a suggestion. It would be nice if we could add the current size of the heap or (% of heap used) in the log entry whenever sleep threshold had exceeded a lot. It could be helpful.
On Wed, May 9, 2018 at 11:26 AM, Patrick Hunt <[email protected]> wrote: > On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar <[email protected]> > wrote: > > > Thanks Patrick, great question. > > My understanding is that this tool not only shows if JVM spends too much > > time in GC, but if, for any other reason, there is a JVM pause (The tool > > only differentiates GC pause from all other pause). This could be slow > > fsync (although we do have logs for that) or even server/OS related. > > > > But again, this is just my interpretation. I will ask the source of the > > idea, what extra benefits this gives them over java GC log. > > > > I checked ZK, I don't see it enabled by default, but GC logging can be > set > > with JVM parameters easily, so that shouldn't be a key factor anyway. > > > > > I think that would be a useful change regardless - to make it on by default > I mean. Also some docs wrt our recommendations, how to troubleshoot, etc... > Adding a feature is useful, but ensuring people know about it and can use > it effectively is even more so. > > Regards, > > Patrick > > > > Regards, > > Norbert > > > > On Wed, May 9, 2018 at 7:57 PM Patrick Hunt <[email protected]> wrote: > > > > > Do you know why they did this rather than just enabling GC logging by > > > default? Why re-invent the wheel? > > > > > > I seem to remember seeing a push do enable GC logging by default a few > > > years ago. In particular around the time when the JVM added GC log > > rolling > > > as a feature. Here's an example: > > > > > > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how- > > to-enable-logs-rotation-with-hotspot/ > > > My understanding is that the overhead is so low that it's feasible to > do > > > this. > > > > > > Good improvement though regardless which way we go. > > > > > > Regards, > > > > > > Patrick > > > > > > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar <[email protected]> > wrote: > > > > > > > +1 cool! > > > > > > > > > > > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar <[email protected] > > > > > > wrote: > > > > > > > > > Okay, thanks Ed, I created the Jira, will look into it soon :) > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037 > > > > > > > > > > Regards, > > > > > Norbert > > > > > > > > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > +1. Sounds really nice to have feature. Let's open a ticket and > > open > > > a > > > > > PR. > > > > > > :) > > > > > > > > > > > > Ed > > > > > > > > > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar < > > [email protected] > > > > > > > > > > escreveu: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I just got a tip that we could improve on the logging in > > ZooKeeper. > > > > > > After a > > > > > > > ZK crash, or client timeout sometimes it's hard to determine > from > > > the > > > > > > logs > > > > > > > what happened. Knowing if ZK was responsive at the time would > > help > > > a > > > > > lot. > > > > > > > For example, ZK might spend a lot of time waiting on GC (there > is > > > > still > > > > > > > some misconception that ZK is a storage). > > > > > > > > > > > > > > To help detect this, HADOOP already has a great tool called JVM > > > Pause > > > > > > > Monitor. (As the name suggest, it can be also used for > > monitoring, > > > > but > > > > > it > > > > > > > also helps post-mortem in a lot of cases). Basically it has a > > > daemon > > > > > that > > > > > > > sleeps for one second, and if the sleep time exceeds the 1s by > > more > > > > > than > > > > > > > the threshold (1s: INFO, 10s: WARN by default - this can be > > > > > configurable > > > > > > in > > > > > > > our case, see below), it will alert/make a log entry. It can > also > > > > > monitor > > > > > > > the time GC took. > > > > > > > > > > > > > > Now, this class is in the HADOOP-common. I wouldn't want to > > depend > > > on > > > > > > > Hadoop-common because of this one feature/class (it is > actually a > > > > > single > > > > > > > class). Since this is a straightforward implementation, and in > > the > > > > past > > > > > > > five years the few commits it had is nothing really serious, I > > > think > > > > we > > > > > > > could just copy this class in ZooKeeper, and introduce it as a > > > > > > configurable > > > > > > > feature, by default it can be off. > > > > > > > > > > > > > > The class: > > > > > > > > > > > > > > > > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop- > > > > > common-project/hadoop-common/src/main/java/org/apache/ > > > > > hadoop/util/JvmPauseMonitor.java > > > > > > > > > > > > > > What do You think? > > > > > > > > > > > > > > Regards, > > > > > > > Norbert > > > > > > > > > > > > > > > > > > > > > > > > > > > >
