[
https://issues.apache.org/jira/browse/HBASE-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926303#comment-13926303
]
Enis Soztutar commented on HBASE-10642:
---------------------------------------
bq. (The existing 0.94 patch picked up the distribution from the table, not the
snapshot, I am not sure the HFileLinks influence this and whether even the
trunk patch does the right thing - does it follow HFileLinks? If not, how does
it find the real file distribution?).
>From my reading of StoreFileInfo.computeHDFSBlocksDistribution(), it does the
>right thing, but I have not checked this personally.
bq. Also, in the trunk version I notice that we update the counters after each
record, is that by design? Seems CPU heavy.
We don't have to incr the AtomicLong everytime, we can accumulate sum and the
update the counter occasionally.
bq. Maybe we should report the data locality index that HBase calculates as
metric to M/R?
Makes sense.
I've checked the locality computations. v4 patch looks good.
> Add M/R over snapshots to 0.94
> ------------------------------
>
> Key: HBASE-10642
> URL: https://issues.apache.org/jira/browse/HBASE-10642
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Fix For: 0.94.18
>
> Attachments: 10642-0.94-v2.txt, 10642-0.94-v3.txt, 10642-0.94-v4.txt,
> 10642-0.94.txt, SnapshotInputFormat.java
>
>
> I think we want drive towards all (or most) M/R over HBase to be against
> snapshots and HDFS directly.
> Adopting a simple input format (even if just as a sample) as part of HBase
> will allow us to direct users this way.
--
This message was sent by Atlassian JIRA
(v6.2#6252)