[ https://issues.apache.org/jira/browse/HBASE-18764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158789#comment-16158789 ]
Josh Elser commented on HBASE-18764: ------------------------------------ Thanks for the reply! This is definitely one of those tricky kinds of situations. To use your same argument against you, RS logging would still not give us a definitive clue as to where the problem lies, just a hint of where to look. When a RegionServer is processing 15+ RPCs at one point in time, it requires a bit of human effort to figure out which handler thread was actually tied to the user whose request missed the SLA. That's the crux of why I suggested looking at HDFS metrics would be a better approach. To that point, if you want to figure out why some specific request was slow, and specifically what part of the request was slow, we'd instead want to use something like HTrace to aggregate this. However, in practice, this too is difficult due to the overhead of the tracing as well as the intermittent slowness that typically comes with these kinds of problems. bq. When we talk about hdfs metrics, we will see 95th, 90th, mean, median etc Yep, that is the downside of using aggregations like percentiles :). In most cases, we know that we don't want to look at tail of operation times because of "expected" slowness (e.g. JVM garbage collection or compactions); however, there are certainly cases when looking at all of the metrics is necessary to identify a culprit. Specifically, the case you have identified when a small percent of requests to HDFS or, even worse, a small percent of requests to a DataNode fall outside the bounds of normalcy. At the end of the day, I'm not trying to levy a veto on this kind of change, but instead trying to outline some cases in which this will just have you blocked on the next question :). Since I've already spent this much time thinking about the subject, let me look closer at your patch too: {noformat} + this.conf = new HBaseConfiguration(); {noformat} The no-args constructor is deprecated. {noformat} + this.slowReadNs = + 1000000 * conf.getInt("hbase.regionserver.hlog.slowsync.ms", + DEFAULT_SLOW_READ_TIME_MS); {noformat} Create a new configuration property, don't reuse the slowsync key. {noformat} + long timeInNanos = System.nanoTime() - startNano; + if (timeInNanos > this.slowReadNs) { {noformat} Nit: combine these two lines {noformat} + DFSInputStream dfsInputStream = (DFSInputStream) is.getWrappedStream(); {noformat} You need to check if this is an instance of {{DFSInputStream}} instead of blindly casting it. {noformat} + } catch (Exception e) { + e.printStackTrace(); + } {noformat} We do not print to stdout/stderr directly. You use a logger to do that. In this case, since this is not a critical codepath, I'd suggest you make a {{debug}} log call, with a message making clear that the exception was caught trying to collect this slowread information. We want to make sure that if/when this block happens, an admin/user would not interpret the exception as something they need to take action on. > add slow read block log entry to alert slow datanodeinfo when reading a block > is slow > ------------------------------------------------------------------------------------- > > Key: HBASE-18764 > URL: https://issues.apache.org/jira/browse/HBASE-18764 > Project: HBase > Issue Type: Improvement > Components: HFile > Affects Versions: 1.1.2 > Reporter: Wang, Xinglong > Priority: Minor > Attachments: HBASE-18764.rev1.1.2.patch > > > HBASE is on top of HDFS and both are distributed system. HBASE will also get > impacted when there is struggler datanode due to network/disk/cpu issue. All > HBASE read/scan towards that datanode will be slowdown. It's not easy for > hbase admin to find out the struggler datanode in such case. > While we have a log entry known as slow sync. One such entry is like the > following. It will help hbase admin to quickly identify the slow datanode in > the pipline in case of network/disk/cup issue with one of the 3 datanods in > pipeline. > {noformat} > 2017-07-08 19:11:30,538 INFO [sync.3] wal.FSHLog: Slow sync cost: 490189 ms, > current pipeline: > [DatanodeInfoWithStorage[xx.xx.xx.xx:50010,DS-c391299a-aa9f-4146-ac7e-a493ae536bff,DISK], > DatanodeInfoWithSt > orage[xx.xx.xx.xx:50010,DS-21a85f8b-f389-4f9e-95a8-b711945fd210,DISK], > DatanodeInfoWithStorage[xx.xx.xx.xx:50010,DS-aa48cef2-3554-482f-b49d-be4763f4d8b8,DISK]] > {noformat} > Inspired by slow sync log entry, I think it will also be beneficial for us to > print out such kind of entry when we encounter slow read case. So that it > will be easy to identify the slow datanode. -- This message was sent by Atlassian JIRA (v6.4.14#64029)