[ https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401683#comment-13401683 ]
Jean-Daniel Cryans commented on HBASE-4145: ------------------------------------------- I just stumbled upon this code, it seems there's an issue in {{TableRecordReaderImpl}}. Calling restart() does this: {code} public void restart(byte[] firstRow) throws IOException { currentScan = new Scan(scan); {code} Which by itself is fine since the metrics will be copied from *scan* to *currentScan*, except that it's *currentScan* that has the updated metrics not *scan*. In other words, *currentScan* is the object that is used for scanning so it contains the metrics. If restart() is called, that object is overwritten by the original definition of the {{Scan}}. I think to fix this we could grab the metrics from *currentScan* first then set them back on the new object. > Provide metrics for hbase client > -------------------------------- > > Key: HBASE-4145 > URL: https://issues.apache.org/jira/browse/HBASE-4145 > Project: HBase > Issue Type: Improvement > Reporter: Ming Ma > Assignee: Ming Ma > Fix For: 0.94.0 > > Attachments: HBaseClientSideMetrics.jpg > > > Sometimes it is useful to get some metrics from hbase client point of view. > This will help understand the metrics for scan/TableInputFormat map job > scenario. > What to capture, for example, for each ResultScanner object, > 1. The number of RPC calls to RSs. > 2. The delta time between consecutive RPC calls in the current serialized > scan implementation. > 3. The number of RPC retry to RSs. > 4. The number of NotServingRegionException got. > 5. The number of remote RPC calls. This excludes those call that hbase client > calls the RS on the same machine. > 6. The number of regions accessed. > How to capture > 1. Metrics framework works for a fixed number of metrics. It doesn't fit this > scenario. > 2. Use some TBD solution in HBase to capture such dynamic metrics. If we > assume there is a solution in HBase that HBase client can use to log such > kind of metrics, TableInputFormat can pass in mapreduce task ID as > application scan ID to HBase client as small addition to existing scan API; > and HBase client can log metrics accordingly with such ID. That will allow > query, analysis later on the metrics data for specific map reduce job. > 3. Expose via MapReduce counter. It lacks certain features, for example, > there is no good way to access the metrics on per map instance; the MapReduce > framework only performs sum on the counter values so it is tricky to find the > max of certain metrics in all mapper instances. However, it might be good > enough for now. With this approach, the metrics value will be available via > MapReduce counter. > a) Have ResultScanner return a new ResultScannerMetrics interface. > b) TableInputFormat will access data from ResultScannerMetrics and populate > MapReduce counters accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira