Re: Tablesplit.getLength returns 0

Lukas Nalezenec Mon, 03 Feb 2014 09:42:35 -0800

Hi, thanks.

I planned to do the patch to Jira. I opened the pull request for codereview.


The configuration option is in RegionSizeCalculator.java line 63 .

https://github.com/lukasnalezenec/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java#L63

Lukas


On 3.2.2014 18:20, Ted Yu wrote:

You can take a look at the following method in HBaseTestingUtility:

   public HTable createTable(byte[] tableName, byte[][] families,

       int numVersions, byte[] startKey, byte[] endKey, int numRegions)
throws IOException {

I saw you issue a git pull request - please generate patch based on trunk
and attach to JIRA. HBase source repo is currently in subversion.
In https://github.com/apache/hbase/pull/8/files , I don't seem to find the
new config parameter which turns this feature on/off.

Regards

On Mon, Feb 3, 2014 at 8:37 AM, Lukas Nalezenec <
[email protected]> wrote:

I have done some changes - see https://issues.apache.org/
jira/browse/HBASE-10413 for more discussion.

I need help with unit test. Is there some simple unit test helper/utility
i can use ?  I need to create table with some regions and then work with
their sizes.  It should be local, there should be some level of abstraction.

The code works well but there are outlayers - one map with 1.6G region and
250MB "Map output bytes" takes 1 hour (it should take few minutes). Do you
have got some idea why this happens ?

2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb
= 200
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data
buffer = 159383552/199229440
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record
buffer = 524288/655360

2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling
map output: record full = true
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart =
0; bufend = 118888312; bufvoid = 199229440
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart =
0; kvend = 524288; length = 655360
2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.snappy]
2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 0
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling
map output: record full = true
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart =
118888312; bufend = 27185690; bufvoid = 199229433
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart =
524288; kvend = 393215; length = 655360
2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 1
2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting
flush of map output
2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 2
2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3
sorted segments
2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor [.snappy]


Lukas



On 31.1.2014 16:35, Ted Yu wrote:

+  public void setLength(long length) {

This method in TableSplit can be package private.

+  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);


Name of class is wrong.


+    makeFamilyFilter(families);


The return value is ignored.



Can you make a patch for trunk and attach to JIRA ?


Thanks


On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
[email protected]> wrote:

  Hi,

I have written first draft: https://github.com/
lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
Can you please review it and let mi know it is feasible solution ?
Lukas


On 30.1.2014 18:14, Nick Dimiduk wrote:

  Sounds good, I'll watch for your patch!

On Thursday, January 30, 2014, Lukas Nalezenec <

[email protected]> wrote:

   I talked with guy who worked on this and he said our production issue
was

probably not directly caused by getLength() returning 0.
Anyway, we are interested in fixing that, estimating length from files
is
good idea.

Lukas

    InputSplit.getLength() and RecordReader.getProgress() is important
for
the

  MR framework to be able to show progress etc. It would be good to

return
raw data sizes in getLength() computed from region's total size of
store
files, and progress being calculated from scanner's amount of raw data
seen.

   Enis

Re: Tablesplit.getLength returns 0

Reply via email to