I have done some changes - see
https://issues.apache.org/jira/browse/HBASE-10413 for more discussion.
I need help with unit test. Is there some simple unit test
helper/utility i can use ? I need to create table with some regions and
then work with their sizes. It should be local, there should be some
level of abstraction.
The code works well but there are outlayers - one map with 1.6G region
and 250MB "Map output bytes" takes 1 hour (it should take few minutes).
Do you have got some idea why this happens ?
2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 200
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data buffer =
159383552/199229440
2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record buffer =
524288/655360
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: record full = true
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0;
bufend = 118888312; bufvoid = 199229440
2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0;
kvend = 524288; length = 655360
2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor [.snappy]
2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling map
output: record full = true
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart =
118888312; bufend = 27185690; bufvoid = 199229433
2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart =
524288; kvend = 393215; length = 655360
2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting flush
of map output
2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2
2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted
segments
2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor [.snappy]
Lukas
On 31.1.2014 16:35, Ted Yu wrote:
+ public void setLength(long length) {
This method in TableSplit can be package private.
+ final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);
Name of class is wrong.
+ makeFamilyFilter(families);
The return value is ignored.
Can you make a patch for trunk and attach to JIRA ?
Thanks
On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
[email protected]> wrote:
Hi,
I have written first draft: https://github.com/
lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
Can you please review it and let mi know it is feasible solution ?
Lukas
On 30.1.2014 18:14, Nick Dimiduk wrote:
Sounds good, I'll watch for your patch!
On Thursday, January 30, 2014, Lukas Nalezenec <
[email protected]> wrote:
I talked with guy who worked on this and he said our production issue was
probably not directly caused by getLength() returning 0.
Anyway, we are interested in fixing that, estimating length from files is
good idea.
Lukas
InputSplit.getLength() and RecordReader.getProgress() is important for
the
MR framework to be able to show progress etc. It would be good to return
raw data sizes in getLength() computed from region's total size of store
files, and progress being calculated from scanner's amount of raw data
seen.
Enis