[ 
https://issues.apache.org/jira/browse/HBASE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926303#comment-13926303
 ] 

Enis Soztutar commented on HBASE-10642:
---------------------------------------

bq. (The existing 0.94 patch picked up the distribution from the table, not the 
snapshot, I am not sure the HFileLinks influence this and whether even the 
trunk patch does the right thing - does it follow HFileLinks? If not, how does 
it find the real file distribution?).
>From my reading of StoreFileInfo.computeHDFSBlocksDistribution(), it does the 
>right thing, but I have not checked this personally. 
bq. Also, in the trunk version I notice that we update the counters after each 
record, is that by design? Seems CPU heavy.
We don't have to incr the AtomicLong everytime, we can accumulate sum and the 
update the counter occasionally. 
bq. Maybe we should report the data locality index that HBase calculates as 
metric to M/R?
Makes sense.
I've checked the locality computations. v4 patch looks good. 

> Add M/R over snapshots to 0.94
> ------------------------------
>
>                 Key: HBASE-10642
>                 URL: https://issues.apache.org/jira/browse/HBASE-10642
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>             Fix For: 0.94.18
>
>         Attachments: 10642-0.94-v2.txt, 10642-0.94-v3.txt, 10642-0.94-v4.txt, 
> 10642-0.94.txt, SnapshotInputFormat.java
>
>
> I think we want drive towards all (or most) M/R over HBase to be against 
> snapshots and HDFS directly.
> Adopting a simple input format (even if just as a sample) as part of HBase 
> will allow us to direct users this way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to