[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Nalezenec updated HBASE-10413:
------------------------------------

    Description: 
InputSplits should be sorted by length but TableSplit does not contain real 
getLength implementation:

  @Override
  public long getLength() {
    // Not clear how to obtain this... seems to be used only for sorting splits
    return 0;
  }

This is causing us problem with scheduling - we have got jobs that are supposed 
to finish in limited time but they get often stuck in last mapper working on 
large region.

Can we implement this method ? 
What is the best way ?

We were thinking about estimating size by size of files on HDFS.
We would like to get Scanner from TableSplit, use startRow, stopRow and column 
families to get corresponding region than computing size of HDFS for given 
region and column family. 


Update:
This ticket talked about production issue - I talked with guy who worked on 
this and he said our production issue was probably not directly caused by 
getLength() returning 0. 

  was:
We had serious issue in our production today.

InputSplits should be sorted by length but TableSplit does not contain real 
getLength implementation:

  @Override
  public long getLength() {
    // Not clear how to obtain this... seems to be used only for sorting splits
    return 0;
  }

Can we implement this method ? 
What is the best way ?

        Summary: Tablesplit.getLength returns 0  (was: TableSplits are not 
sorted by size.)

> Tablesplit.getLength returns 0
> ------------------------------
>
>                 Key: HBASE-10413
>                 URL: https://issues.apache.org/jira/browse/HBASE-10413
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, mapreduce
>    Affects Versions: 0.96.1.1
>            Reporter: Lukas Nalezenec
>
> InputSplits should be sorted by length but TableSplit does not contain real 
> getLength implementation:
>   @Override
>   public long getLength() {
>     // Not clear how to obtain this... seems to be used only for sorting 
> splits
>     return 0;
>   }
> This is causing us problem with scheduling - we have got jobs that are 
> supposed to finish in limited time but they get often stuck in last mapper 
> working on large region.
> Can we implement this method ? 
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and 
> column families to get corresponding region than computing size of HDFS for 
> given region and column family. 
> Update:
> This ticket talked about production issue - I talked with guy who worked on 
> this and he said our production issue was probably not directly caused by 
> getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to