I have done some changes - see https://issues.apache.org/jira/browse/HBASE-10413 for more discussion.

I need help with unit test. Is there some simple unit test helper/utility i can use ? I need to create table with some regions and then work with their sizes. It should be local, there should be some level of abstraction.

The code works well but there are outlayers - one map with 1.6G region and 250MB "Map output bytes" takes 1 hour (it should take few minutes). Do you have got some idea why this happens ?

2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 200
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
159383552/199229440
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record buffer = 
524288/655360

2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: record full = true
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; 
bufend = 118888312; bufvoid = 199229440
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; 
kvend = 524288; length = 655360
2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new compressor [.snappy]
2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: record full = true
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart = 
118888312; bufend = 27185690; bufvoid = 199229433
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart = 
524288; kvend = 393215; length = 655360
2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
of map output
2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2
2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted 
segments
2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got 
brand-new decompressor [.snappy]


Lukas


On 31.1.2014 16:35, Ted Yu wrote:
+  public void setLength(long length) {

This method in TableSplit can be package private.

+  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);


Name of class is wrong.


+    makeFamilyFilter(families);


The return value is ignored.



Can you make a patch for trunk and attach to JIRA ?


Thanks


On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
[email protected]> wrote:

Hi,
I have written first draft: https://github.com/
lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
Can you please review it and let mi know it is feasible solution ?
Lukas


On 30.1.2014 18:14, Nick Dimiduk wrote:

Sounds good, I'll watch for your patch!

On Thursday, January 30, 2014, Lukas Nalezenec <

[email protected]> wrote:

  I talked with guy who worked on this and he said our production issue was
probably not directly caused by getLength() returning 0.
Anyway, we are interested in fixing that, estimating length from files is
good idea.

Lukas

   InputSplit.getLength() and RecordReader.getProgress() is important for
the

MR framework to be able to show progress etc. It would be good to return
raw data sizes in getLength() computed from region's total size of store
files, and progress being calculated from scanner's amount of raw data
seen.

  Enis




Reply via email to