The new config is checked inside RegionSizeCalculator ctor. Instantiation of RegionSizeCalculator can be skipped if the config says disabled, right ?
On Mon, Feb 3, 2014 at 9:41 AM, Lukas Nalezenec < [email protected]> wrote: > Hi, thanks. > I planned to do the patch to Jira. I opened the pull request for code > review. > > The configuration option is in RegionSizeCalculator.java line 63 . > > https://github.com/lukasnalezenec/hbase/blob/trunk/hbase-server/src/main/ > java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java#L63 > > Lukas > > > On 3.2.2014 18:20, Ted Yu wrote: > >> You can take a look at the following method in HBaseTestingUtility: >> >> public HTable createTable(byte[] tableName, byte[][] families, >> >> int numVersions, byte[] startKey, byte[] endKey, int numRegions) >> throws IOException { >> >> I saw you issue a git pull request - please generate patch based on trunk >> and attach to JIRA. HBase source repo is currently in subversion. >> In https://github.com/apache/hbase/pull/8/files , I don't seem to find >> the >> new config parameter which turns this feature on/off. >> >> Regards >> >> On Mon, Feb 3, 2014 at 8:37 AM, Lukas Nalezenec < >> [email protected]> wrote: >> >> I have done some changes - see https://issues.apache.org/ >>> jira/browse/HBASE-10413 for more discussion. >>> >>> I need help with unit test. Is there some simple unit test helper/utility >>> i can use ? I need to create table with some regions and then work with >>> their sizes. It should be local, there should be some level of >>> abstraction. >>> >>> The code works well but there are outlayers - one map with 1.6G region >>> and >>> 250MB "Map output bytes" takes 1 hour (it should take few minutes). Do >>> you >>> have got some idea why this happens ? >>> >>> 2014-02-03 14:28:43,052 INFO org.apache.hadoop.mapred.MapTask: >>> io.sort.mb >>> = 200 >>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: data >>> buffer = 159383552/199229440 >>> 2014-02-03 14:28:43,183 INFO org.apache.hadoop.mapred.MapTask: record >>> buffer = 524288/655360 >>> >>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: Spilling >>> map output: record full = true >>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: bufstart >>> = >>> 0; bufend = 118888312; bufvoid = 199229440 >>> 2014-02-03 15:00:16,085 INFO org.apache.hadoop.mapred.MapTask: kvstart = >>> 0; kvend = 524288; length = 655360 >>> 2014-02-03 15:00:24,993 INFO org.apache.hadoop.io.compress.CodecPool: >>> Got >>> brand-new compressor [.snappy] >>> 2014-02-03 15:00:25,831 INFO org.apache.hadoop.mapred.MapTask: Finished >>> spill 0 >>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: Spilling >>> map output: record full = true >>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: bufstart >>> = >>> 118888312; bufend = 27185690; bufvoid = 199229433 >>> 2014-02-03 15:35:27,521 INFO org.apache.hadoop.mapred.MapTask: kvstart = >>> 524288; kvend = 393215; length = 655360 >>> 2014-02-03 15:35:30,517 INFO org.apache.hadoop.mapred.MapTask: Finished >>> spill 1 >>> 2014-02-03 15:39:03,759 INFO org.apache.hadoop.mapred.MapTask: Starting >>> flush of map output >>> 2014-02-03 15:39:04,884 INFO org.apache.hadoop.mapred.MapTask: Finished >>> spill 2 >>> 2014-02-03 15:39:04,895 INFO org.apache.hadoop.mapred.Merger: Merging 3 >>> sorted segments >>> 2014-02-03 15:39:04,904 INFO org.apache.hadoop.io.compress.CodecPool: >>> Got >>> brand-new decompressor [.snappy] >>> >>> >>> Lukas >>> >>> >>> >>> >>> On 31.1.2014 16:35, Ted Yu wrote: >>> >>> + public void setLength(long length) { >>>> >>>> This method in TableSplit can be package private. >>>> >>>> + final Log LOG = LogFactory.getLog(MultiTableInputFormatBase.class); >>>> >>>> >>>> Name of class is wrong. >>>> >>>> >>>> + makeFamilyFilter(families); >>>> >>>> >>>> The return value is ignored. >>>> >>>> >>>> >>>> Can you make a patch for trunk and attach to JIRA ? >>>> >>>> >>>> Thanks >>>> >>>> >>>> On Fri, Jan 31, 2014 at 6:55 AM, Lukas Nalezenec < >>>> [email protected]> wrote: >>>> >>>> Hi, >>>> >>>>> I have written first draft: https://github.com/ >>>>> lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32 >>>>> Can you please review it and let mi know it is feasible solution ? >>>>> Lukas >>>>> >>>>> >>>>> On 30.1.2014 18:14, Nick Dimiduk wrote: >>>>> >>>>> Sounds good, I'll watch for your patch! >>>>> >>>>>> On Thursday, January 30, 2014, Lukas Nalezenec < >>>>>> >>>>>> [email protected]> wrote: >>>>>> >>>>>> I talked with guy who worked on this and he said our production >>>>>> issue >>>>>> was >>>>>> >>>>>> probably not directly caused by getLength() returning 0. >>>>>>> Anyway, we are interested in fixing that, estimating length from >>>>>>> files >>>>>>> is >>>>>>> good idea. >>>>>>> >>>>>>> Lukas >>>>>>> >>>>>>> InputSplit.getLength() and RecordReader.getProgress() is >>>>>>> important >>>>>>> for >>>>>>> the >>>>>>> >>>>>>> MR framework to be able to show progress etc. It would be good to >>>>>>> >>>>>>>> return >>>>>>>> raw data sizes in getLength() computed from region's total size of >>>>>>>> store >>>>>>>> files, and progress being calculated from scanner's amount of raw >>>>>>>> data >>>>>>>> seen. >>>>>>>> >>>>>>>> Enis >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >
