[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890550#comment-13890550
]
Lukas Nalezenec commented on HBASE-10413:
-----------------------------------------
Hi,
I know it is hacky. It is my first hbase commit, i was not sure how to do it so
I asked 3 people and then published first draft as soon as possible. Everybody
was fine with the solution :( .
The hacky solution is good enough for us - I have already deployed it
yesterday. I cant spent much more time on this. I need to close it by tomorrow.
How about this solution? I am not sure if it is the best way - it does not work
with Scan ranges.
ToDos:
We need to filter regions by table
It would be nice to if we could filter size by column families.
https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R68
HBaseAdmin admin = new HBaseAdmin(configuration);
ClusterStatus clusterStatus = admin.getClusterStatus();
Collection<ServerName> servers = clusterStatus.getServers();
for (ServerName serverName: servers) {
ServerLoad serverLoad = clusterStatus.getLoad(serverName);
for (Map.Entry<byte[], RegionLoad> regionEntry:
serverLoad.getRegionsLoad().entrySet()) {
byte[] regionId = regionEntry.getKey();
RegionLoad regionLoad = regionEntry.getValue();
long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() +
regionLoad.getStorefileSizeMB());
sizeMap.put(regionId, regionSize);
}
}
> Tablesplit.getLength returns 0
> ------------------------------
>
> Key: HBASE-10413
> URL: https://issues.apache.org/jira/browse/HBASE-10413
> Project: HBase
> Issue Type: Bug
> Components: Client, mapreduce
> Affects Versions: 0.96.1.1
> Reporter: Lukas Nalezenec
> Assignee: Lukas Nalezenec
>
> InputSplits should be sorted by length but TableSplit does not contain real
> getLength implementation:
> @Override
> public long getLength() {
> // Not clear how to obtain this... seems to be used only for sorting
> splits
> return 0;
> }
> This is causing us problem with scheduling - we have got jobs that are
> supposed to finish in limited time but they get often stuck in last mapper
> working on large region.
> Can we implement this method ?
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and
> column families to get corresponding region than computing size of HDFS for
> given region and column family.
> Update:
> This ticket was about production issue - I talked with guy who worked on this
> and he said our production issue was probably not directly caused by
> getLength() returning 0.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)