[ 
https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401683#comment-13401683
 ] 

Jean-Daniel Cryans commented on HBASE-4145:
-------------------------------------------

I just stumbled upon this code, it seems there's an issue in 
{{TableRecordReaderImpl}}. Calling restart() does this:

{code}
public void restart(byte[] firstRow) throws IOException {
  currentScan = new Scan(scan);
{code}

Which by itself is fine since the metrics will be copied from *scan* to 
*currentScan*, except that it's *currentScan* that has the updated metrics not 
*scan*.

In other words, *currentScan* is the object that is used for scanning so it 
contains the metrics. If restart() is called, that object is overwritten by the 
original definition of the {{Scan}}. I think to fix this we could grab the 
metrics from *currentScan* first then set them back on the new object.
                
> Provide metrics for hbase client
> --------------------------------
>
>                 Key: HBASE-4145
>                 URL: https://issues.apache.org/jira/browse/HBASE-4145
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>             Fix For: 0.94.0
>
>         Attachments: HBaseClientSideMetrics.jpg
>
>
> Sometimes it is useful to get some metrics from hbase client point of view. 
> This will help understand the metrics for scan/TableInputFormat map job 
> scenario.
> What to capture, for example, for each ResultScanner object,
> 1. The number of RPC calls to RSs.
> 2. The delta time between consecutive RPC calls in the current serialized 
> scan implementation.
> 3. The number of RPC retry to RSs.
> 4. The number of NotServingRegionException got.
> 5. The number of remote RPC calls. This excludes those call that hbase client 
> calls the RS on the same machine.
> 6. The number of regions accessed.
> How to capture
> 1. Metrics framework works for a fixed number of metrics. It doesn't fit this 
> scenario.
> 2. Use some TBD solution in HBase to capture such dynamic metrics. If we 
> assume there is a solution in HBase that HBase client can use to log such 
> kind of metrics, TableInputFormat can pass in mapreduce task ID as 
> application scan ID to HBase client as small addition to existing scan API; 
> and HBase client can log metrics accordingly with such ID. That will allow 
> query, analysis later on the metrics data for specific map reduce job.
> 3. Expose via MapReduce counter. It lacks certain features, for example, 
> there is no good way to access the metrics on per map instance; the MapReduce 
> framework only performs sum on the counter values so it is tricky to find the 
> max of certain metrics in all mapper instances. However, it might be good 
> enough for now. With this approach, the metrics value will be available via 
> MapReduce counter.
> a) Have ResultScanner return a new ResultScannerMetrics interface.
> b) TableInputFormat will access data from ResultScannerMetrics and populate 
> MapReduce counters accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to