Re: Tablesplit.getLength returns 0

Ted Yu Mon, 03 Feb 2014 10:38:41 -0800

The new config is checked inside RegionSizeCalculator ctor.
Instantiation of RegionSizeCalculator can be skipped if the config says
disabled, right ?



On Mon, Feb 3, 2014 at 9:41 AM, Lukas Nalezenec <
[email protected]> wrote:

> Hi, thanks.
> I planned to do the patch to Jira. I opened the pull request for code
> review.
>
> The configuration option is in RegionSizeCalculator.java line 63 .
>
> https://github.com/lukasnalezenec/hbase/blob/trunk/hbase-server/src/main/
> java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java#L63
>
> Lukas
>
>
> On 3.2.2014 18:20, Ted Yu wrote:
>
>> You can take a look at the following method in HBaseTestingUtility:
>>
>>    public HTable createTable(byte[] tableName, byte[][] families,
>>
>>        int numVersions, byte[] startKey, byte[] endKey, int numRegions)
>> throws IOException {
>>
>> I saw you issue a git pull request - please generate patch based on trunk
>> and attach to JIRA. HBase source repo is currently in subversion.
>> In https://github.com/apache/hbase/pull/8/files , I don't seem to find
>> the
>> new config parameter which turns this feature on/off.
>>
>> Regards
>>
>> On Mon, Feb 3, 2014 at 8:37 AM, Lukas Nalezenec <
>> [email protected]> wrote:
>>
>>  I have done some changes - see https://issues.apache.org/
>>> jira/browse/HBASE-10413 for more discussion.
>>>
>>> I need help with unit test. Is there some simple unit test helper/utility
>>> i can use ?  I need to create table with some regions and then work with
>>> their sizes.  It should be local, there should be some level of
>>> abstraction.
>>>
>>> The code works well but there are outlayers - one map with 1.6G region
>>> and
>>> 250MB "Map output bytes" takes 1 hour (it should take few minutes). Do
>>> you
>>> have got some idea why this happens ?
>>>
>>> 2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask:
>>> io.sort.mb
>>> = 200
>>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data
>>> buffer = 159383552/199229440
>>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record
>>> buffer = 524288/655360
>>>
>>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling
>>> map output: record full = true
>>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart
>>> =
>>> 0; bufend = 118888312; bufvoid = 199229440
>>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart =
>>> 0; kvend = 524288; length = 655360
>>> 2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> brand-new compressor [.snappy]
>>> 2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished
>>> spill 0
>>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling
>>> map output: record full = true
>>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart
>>> =
>>> 118888312; bufend = 27185690; bufvoid = 199229433
>>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart =
>>> 524288; kvend = 393215; length = 655360
>>> 2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished
>>> spill 1
>>> 2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting
>>> flush of map output
>>> 2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished
>>> spill 2
>>> 2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3
>>> sorted segments
>>> 2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> brand-new decompressor [.snappy]
>>>
>>>
>>> Lukas
>>>
>>>
>>>
>>>
>>> On 31.1.2014 16:35, Ted Yu wrote:
>>>
>>>  +  public void setLength(long length) {
>>>>
>>>> This method in TableSplit can be package private.
>>>>
>>>> +  final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class);
>>>>
>>>>
>>>> Name of class is wrong.
>>>>
>>>>
>>>> +    makeFamilyFilter(families);
>>>>
>>>>
>>>> The return value is ignored.
>>>>
>>>>
>>>>
>>>> Can you make a patch for trunk and attach to JIRA ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec <
>>>> [email protected]> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>> I have written first draft: https://github.com/
>>>>> lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32
>>>>> Can you please review it and let mi know it is feasible solution ?
>>>>> Lukas
>>>>>
>>>>>
>>>>> On 30.1.2014 18:14, Nick Dimiduk wrote:
>>>>>
>>>>>   Sounds good, I'll watch for your patch!
>>>>>
>>>>>> On Thursday, January 30, 2014, Lukas Nalezenec <
>>>>>>
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>    I talked with guy who worked on this and he said our production
>>>>>> issue
>>>>>> was
>>>>>>
>>>>>>  probably not directly caused by getLength() returning 0.
>>>>>>> Anyway, we are interested in fixing that, estimating length from
>>>>>>> files
>>>>>>> is
>>>>>>> good idea.
>>>>>>>
>>>>>>> Lukas
>>>>>>>
>>>>>>>     InputSplit.getLength() and RecordReader.getProgress() is
>>>>>>> important
>>>>>>> for
>>>>>>> the
>>>>>>>
>>>>>>>   MR framework to be able to show progress etc. It would be good to
>>>>>>>
>>>>>>>> return
>>>>>>>> raw data sizes in getLength() computed from region's total size of
>>>>>>>> store
>>>>>>>> files, and progress being calculated from scanner's amount of raw
>>>>>>>> data
>>>>>>>> seen.
>>>>>>>>
>>>>>>>>    Enis
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Tablesplit.getLength returns 0

Reply via email to