+1, although kind of late since it's already done. But great to see this 5+ years old issue finally resolved.
On Mon, Jan 16, 2017 at 9:24 PM, Stack <[email protected]> wrote: > On Sat, Jan 14, 2017 at 9:50 PM, Lars George <[email protected]> > wrote: > > > I think that makes sense. The tool with its custom code dates back to > > where we had no built in version. I am all for removing all of the tools > > and leave the API call only. That is the same for an admin then compared > to > > calling flush or split. > > > > No? > > > > > Sounds good to me. > St.Ack > > > > > Lars > > > > Sent from my iPhone > > > > On 15 Jan 2017, at 04:25, Stephen Jiang <[email protected]> wrote: > > > > >> If you remove the util.Merge tool, how then does an operator ask for a > > merge > > > in its absence? > > > > > > We have a shell command to merge region. In the past, it calls the > same > > RS > > > side code. I don't think there is a need to have util.Merge (even if > we > > > really want, we can ask this utility to call HBaseAdmin.mergeRegions, > > which > > > is the same path from the merge command through 'hbase shell'). > > > > > > Thanks > > > Stephen > > > > > >> On Fri, Jan 13, 2017 at 11:29 PM, Stack <[email protected]> wrote: > > >> > > >> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang < > [email protected] > > > > > >> wrote: > > >> > > >>> Revive this thread > > >>> > > >>> I am in the process of removing Region Server side merge (and split) > > >>> transaction code in master branch; as now we have merge (and split) > > >>> procedure(s) from master doing the same thing. > > >>> > > >>> > > >> Good (Issue?) > > >> > > >> > > >>> The Merge tool depends on RS-side merge code. I'd like to use this > > >> chance > > >>> to remove the util.Merge tool. This is for 2.0 and up releases only. > > >>> Deprecation does not work here; as keeping the RS-side merge code > would > > >>> have duplicate logic in source code and make the new Assignment > manager > > >>> code more complicated. > > >>> > > >>> > > >> Could util.Merge be changed to ask the Master run the merge (via > AMv2)? > > >> > > >> If you remove the util.Merge tool, how then does an operator ask for a > > >> merge in its absence? > > >> > > >> Thanks Stephen > > >> > > >> S > > >> > > >> > > >>> Please let me know whether you have objection. > > >>> > > >>> Thanks > > >>> Stephen > > >>> > > >>> PS. I could deprecated HMerge code if anyone is really using it. It > > has > > >>> its own logic and standalone (supposed to dangerously work offline > and > > >>> merge more than 2 regions - the util.Merge and shell not support > these > > >>> functionality for now). > > >>> > > >>> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <[email protected]> > > >>> wrote: > > >>> > > >>>> @Appy what is not clear from above? > > >>>> > > >>>> I think we should get rid of both Merge and HMerge. > > >>>> > > >>>> We should not have any tool which will work in offline mode by going > > >> over > > >>>> the HDFS data. Seems very brittle to be broken when things get > > changed. > > >>>> Only use case I can think of is that somehow you end up with a lot > of > > >>>> regions and you cannot bring the cluster back up because of OOMs, > etc > > >> and > > >>>> you have to reduce the number of regions in offline mode. However, > we > > >> did > > >>>> not see this kind of thing in any of our customers for the last > couple > > >> of > > >>>> years so far. > > >>>> > > >>>> I think we should seriously look into improving normalizer and > > enabling > > >>>> that by default for all the tables. Ideally, normalizer should be > > >> running > > >>>> much more frequently, and should be configured with higher-level > goals > > >>> and > > >>>> heuristics. Like on average how many regions per node, etc and > should > > >> be > > >>>> looking at the global state (like the balancer) to decide on split / > > >>> merge > > >>>> points. > > >>>> > > >>>> Enis > > >>>> > > >>>> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <[email protected] > > > > >>>> wrote: > > >>>> > > >>>>> bq. HMerge can merge multiple regions by going over the list of > > >>>>> regions and checking > > >>>>> their sizes. > > >>>>> bq. But both of these tools (Merge and HMerge) are very dangerous > > >>>>> > > >>>>> I came across HMerge and it looks like dead code. Isn't referenced > > >> from > > >>>>> anywhere except one test. (This is what lars also pointed out in > the > > >>>> first > > >>>>> email too). > > >>>>> It would make perfect sense if it was a tool or was being > referenced > > >>> from > > >>>>> somewhere, but with lack of either of that, am a bit confused here. > > >>>>> @Enis, you seem to know everything about them, please educate me. > > >>>>> Thanks > > >>>>> - Appy > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar < > [email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Merge has very limited usability singe it can do a single merge > and > > >>> can > > >>>>>> only run when HBase is offline. > > >>>>>> HMerge can merge multiple regions by going over the list of > regions > > >>> and > > >>>>>> checking their sizes. > > >>>>>> And of course we have the "supported" online merge which is the > > >> shell > > >>>>>> command. > > >>>>>> > > >>>>>> But both of these tools (Merge and HMerge) are very dangerous I > > >>> think. > > >>>> I > > >>>>>> would say we should deprecate both to be replaced by the online > > >>> merger > > >>>>>> tool. We should not allow offline merge at all. I fail to see the > > >>>> usecase > > >>>>>> that you have to use an offline merge. > > >>>>>> > > >>>>>> Enis > > >>>>>> > > >>>>>> On Wed, Sep 28, 2016 at 7:32 AM, Lars George < > > >> [email protected]> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hey, > > >>>>>>> > > >>>>>>> Sorry to resurrect this old thread, but working on the book > > >>> update, I > > >>>>>>> came across the same today, i.e. we have Merge and HMerge. I > > >> tried > > >>>> and > > >>>>>>> Merge works fine now. It is also the only one of the two flagged > > >> as > > >>>>>>> being a tool. Should HMerge be removed? At least deprecated? > > >>>>>>> > > >>>>>>> Cheers, > > >>>>>>> Lars > > >>>>>>> > > >>>>>>> > > >>>>>>> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <[email protected]> > > >>> wrote: > > >>>>>>>>>> there is already an issue to do this but not revamp of these > > >>>> Merge > > >>>>>>>> classes > > >>>>>>>> I guess the issue is HBASE-1621 > > >>>>>>>> > > >>>>>>>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <[email protected]> > > >> wrote: > > >>>>>>>> > > >>>>>>>>> Yeah, can you file an issue Lars. This stuff is ancient and > > >>> needs > > >>>>> to > > >>>>>>>>> be redone AND redone so we can do merging while table is > > >> online > > >>>>> (there > > >>>>>>>>> is already an issue to do this but not revamp of these Merge > > >>>>> classes). > > >>>>>>>>> The unit tests for Merge are also all junit3 and do whacky > > >>> stuff > > >>>> to > > >>>>>>>>> put up multiple regions. This should be redone too (they are > > >>>> often > > >>>>>>>>> first thing broke when major change and putting them back > > >>> together > > >>>>> is > > >>>>>>>>> a headache since they do not follow the usual pattern). > > >>>>>>>>> > > >>>>>>>>> St.Ack > > >>>>>>>>> > > >>>>>>>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George < > > >>>> [email protected] > > >>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>> Hi Ted, > > >>>>>>>>>> > > >>>>>>>>>> The log is from an earlier attempt, I tried this a few > > >> times. > > >>>> This > > >>>>>> is > > >>>>>>> all > > >>>>>>>>> local, after rm'ing the /hbase. So the files are all pretty > > >>> empty, > > >>>>> but > > >>>>>>> since > > >>>>>>>>> I put data in I was assuming it should work. Once you gotten > > >>> into > > >>>>> this > > >>>>>>>>> state, you also get funny error messages in the shell: > > >>>>>>>>>> > > >>>>>>>>>> hbase(main):001:0> list > > >>>>>>>>>> TABLE > > >>>>>>>>>> 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using > > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for > > >>>>>>>>> org.apache.hadoop.hbase.ipc.HMasterInterface > > >>>>>>>>>> > > >>>>>>>>>> ERROR: undefined method `map' for nil:NilClass > > >>>>>>>>>> > > >>>>>>>>>> Here is some help for this command: > > >>>>>>>>>> List all tables in hbase. Optional regular expression > > >>> parameter > > >>>>>> could > > >>>>>>>>>> be used to filter the output. Examples: > > >>>>>>>>>> > > >>>>>>>>>> hbase> list > > >>>>>>>>>> hbase> list 'abc.*' > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> hbase(main):002:0> > > >>>>>>>>>> > > >>>>>>>>>> I am assuming this is collateral, but why? The UI works but > > >>> the > > >>>>>> table > > >>>>>>> is > > >>>>>>>>> gone too. > > >>>>>>>>>> > > >>>>>>>>>> Lars > > >>>>>>>>>> > > >>>>>>>>>>> On Jul 2, 2011, at 10:55 PM, Ted Yu wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> There is TestMergeTool which tests Merge. > > >>>>>>>>>>> > > >>>>>>>>>>> From the log you provided, I got a little confused as why > > >>>>>>>>>>> 'testtable,row-20,1309613053987. > > >>> 23a35ac696bdf4a8023dcc4c5b8419 > > >>>>> e0.' > > >>>>>>>>> didn't > > >>>>>>>>>>> appear in your command line or the output from .META. > > >>> scanning. > > >>>>>>>>>>> > > >>>>>>>>>>> On Sat, Jul 2, 2011 at 10:36 AM, Lars George < > > >>>>>> [email protected]> > > >>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi, > > >>>>>>>>>>>> > > >>>>>>>>>>>> These two seem both in a bit of a weird state: HMerge is > > >>>> scoped > > >>>>>>> package > > >>>>>>>>>>>> local, therefore no one but the package can call the > > >> merge() > > >>>>>>>>> functions... > > >>>>>>>>>>>> and no one does that but the unit test. But it would be > > >> good > > >>>> to > > >>>>>> have > > >>>>>>>>> this on > > >>>>>>>>>>>> the CLI and shell as a command (and in the shell maybe > > >> with > > >>> a > > >>>>>>>>> confirmation > > >>>>>>>>>>>> message?), but it is not available AFAIK. > > >>>>>>>>>>>> > > >>>>>>>>>>>> HMerge can merge regions of tables that are disabled. It > > >>> also > > >>>>>> merges > > >>>>>>>>> all > > >>>>>>>>>>>> that qualify, i.e. where the merged region is less than or > > >>>> equal > > >>>>>> of > > >>>>>>>>> half the > > >>>>>>>>>>>> configured max file size. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Merge on the other hand does have a main(), so can be > > >>> invoked: > > >>>>>>>>>>>> > > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge > > >>>>>>>>>>>> Usage: bin/hbase merge <table-name> <region-1> <region-2> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Note how the help insinuates that you can use it as a > > >> tool, > > >>>> but > > >>>>>>> that is > > >>>>>>>>> not > > >>>>>>>>>>>> correct. Also, it only merges two given regions, and the > > >>>> cluster > > >>>>>>> must > > >>>>>>>>> be > > >>>>>>>>>>>> shut down (only the HBase daemons). So that is a step > > >> back. > > >>>>>>>>>>>> > > >>>>>>>>>>>> What is worse is that I cannot get it to work. I tried in > > >>> the > > >>>>>> shell: > > >>>>>>>>>>>> > > >>>>>>>>>>>> hbase(main):001:0> create 'testtable', 'colfam1', {SPLITS > > >>> => > > >>>>>>>>>>>> ['row-10','row-20','row-30','row-40','row-50']} > > >>>>>>>>>>>> 0 row(s) in 0.2640 seconds > > >>>>>>>>>>>> > > >>>>>>>>>>>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' > > >> do > > >>>> put > > >>>>>>>>>>>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end > > >> end > > >>>>>>>>>>>> 0 row(s) in 1.0450 seconds > > >>>>>>>>>>>> > > >>>>>>>>>>>> hbase(main):003:0> flush 'testtable' > > >>>>>>>>>>>> 0 row(s) in 0.2000 seconds > > >>>>>>>>>>>> > > >>>>>>>>>>>> hbase(main):004:0> scan '.META.', { COLUMNS => > > >>>>>> ['info:regioninfo']} > > >>>>>>>>>>>> ROW COLUMN+CELL > > >>>>>>>>>>>> testtable,,1309614509037.612d1e0112 > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> 406e6c2bb482eeaec57322. STARTKEY => '', ENDKEY > > >>> => > > >>>>>>> 'row-10' > > >>>>>>>>>>>> testtable,row-10,1309614509040.2fba > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> fcc9bc6afac94c465ce5dcabc5d1. STARTKEY => 'row-10', > > >>>> ENDKEY > > >>>>>> => > > >>>>>>>>>>>> 'row-20' > > >>>>>>>>>>>> testtable,row-20,1309614509041.e7c1 > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> 6267eb30e147e5d988c63d40f982. STARTKEY => 'row-20', > > >>>> ENDKEY > > >>>>>> => > > >>>>>>>>>>>> 'row-30' > > >>>>>>>>>>>> testtable,row-30,1309614509041.a9cd > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> e1cbc7d1a21b1aca2ac7fda30ad8. STARTKEY => 'row-30', > > >>>> ENDKEY > > >>>>>> => > > >>>>>>>>>>>> 'row-40' > > >>>>>>>>>>>> testtable,row-40,1309614509041.d458 > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> 236feae097efcf33477e7acc51d4. STARTKEY => 'row-40', > > >>>> ENDKEY > > >>>>>> => > > >>>>>>>>>>>> 'row-50' > > >>>>>>>>>>>> testtable,row-50,1309614509041.74a5 > > >> column=info:regioninfo, > > >>>>>>>>>>>> timestamp=130... > > >>>>>>>>>>>> 7dc7e3e9602d9229b15d4c0357d1. STARTKEY => 'row-50', > > >>>> ENDKEY > > >>>>>> => > > >>>>>>> '' > > >>>>>>>>>>>> 6 row(s) in 0.0440 seconds > > >>>>>>>>>>>> > > >>>>>>>>>>>> hbase(main):005:0> exit > > >>>>>>>>>>>> > > >>>>>>>>>>>> $ ./bin/stop-hbase.sh > > >>>>>>>>>>>> > > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \ > > >>>>>>>>>>>> testtable,row-20,1309614509041. > > >>> e7c16267eb30e147e5d988c63d40f9 > > >>>>> 82. > > >>>>>> \ > > >>>>>>>>>>>> testtable,row-30,1309614509041. > > >>> a9cde1cbc7d1a21b1aca2ac7fda30a > > >>>>> d8. > > >>>>>>>>>>>> > > >>>>>>>>>>>> But I get consistently errors: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO util.Merge: Merging regions > > >>>>>>>>>>>> testtable,row-20,1309613053987. > > >>> 23a35ac696bdf4a8023dcc4c5b8419 > > >>>>> e0. > > >>>>>>> and > > >>>>>>>>>>>> testtable,row-30,1309613053987. > > >> 3664920956c30ac5ff2a7726e4e6 > > >>>> in > > >>>>>>> table > > >>>>>>>>>>>> testtable > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: > > >>>>> blocksize=32 > > >>>>>>> MB, > > >>>>>>>>>>>> rollsize=30.4 MB, enabled=true, optionallogflushinternal= > > >>>> 1000ms > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: New hlog > > >>>>>>>>>>>> > > >>>>>>>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_ > > >>> 1309616449171/hlog. > > >>>>>>> 1309616449181 > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: > > >>> getNumCurrentReplicas--HDFS- > > >>>>> 826 > > >>>>>>> not > > >>>>>>>>>>>> available; hdfs_out=org.apache.hadoop.fs. > > >>>>>>> FSDataOutputStream@25961581, > > >>>>>>>>>>>> > > >>>>>>>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$ > > >>>>>>> ChecksumFSOutputSummer.getNumCurrentReplicas() > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up > > >>>>>>> tabledescriptor > > >>>>>>>>>>>> config now ... > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined > > >>>>>>>>> -ROOT-,,0.70236052; > > >>>>>>>>>>>> next sequenceid=1 > > >>>>>>>>>>>> info: null > > >>>>>>>>>>>> region1: [B@48fd918a > > >>>>>>>>>>>> region2: [B@7f5e2075 > > >>>>>>>>>>>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed > > >>>>>>>>>>>> java.io.IOException: Could not find meta region for > > >>>>>>>>>>>> testtable,row-20,1309613053987. > > >>> 23a35ac696bdf4a8023dcc4c5b8419 > > >>>>> e0. > > >>>>>>>>>>>> at > > >>>>>>>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge. > > >>>>>> java:211) > > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > > >>>> Merge.run(Merge.java:111) > > >>>>>>>>>>>> at org.apache.hadoop.util. > > >> ToolRunner.run(ToolRunner. > > >>>>>> java:65) > > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > > >>>>> Merge.main(Merge.java:386) > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up > > >>>>>>> tabledescriptor > > >>>>>>>>>>>> config now ... > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined > > >>>>>>>>> .META.,,1.1028785192; > > >>>>>>>>>>>> next sequenceid=1 > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed > > >>>>>>> -ROOT-,,0.70236052 > > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting > > >>>>>>>>>>>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error > > >>>>>>>>>>>> java.lang.NullPointerException > > >>>>>>>>>>>> at > > >>>>>>>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java: > > >> 119) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion( > > >>>>>>> MetaUtils.java:229) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion( > > >>>>>>> MetaUtils.java:258) > > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > > >>>> Merge.run(Merge.java:116) > > >>>>>>>>>>>> at org.apache.hadoop.util. > > >> ToolRunner.run(ToolRunner. > > >>>>>> java:65) > > >>>>>>>>>>>> at org.apache.hadoop.hbase.util. > > >>>>> Merge.main(Merge.java:386) > > >>>>>>>>>>>> > > >>>>>>>>>>>> After which I most of the times have shot .META. with an > > >>> error > > >>>>>>>>>>>> > > >>>>>>>>>>>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase. > > >>>>>>> master.HMaster: > > >>>>>>>>> Failed > > >>>>>>>>>>>> getting all descriptors > > >>>>>>>>>>>> java.io.FileNotFoundException: No status for > > >>>>>>>>>>>> hdfs://localhost:8020/hbase/.corrupt > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime( > > >>>>>>> FSUtils.java:888) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get( > > >>>>>>> FSTableDescriptors.java:122) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll( > > >>>>>>> FSTableDescriptors.java:149) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.master.HMaster. > > >>>>> getHTableDescriptors(HMaster. > > >>>>>>> java:1429) > > >>>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl. > > >>> invoke0(Native > > >>>>>>> Method) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke( > > >>>>>>> NativeMethodAccessorImpl.java:39) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke( > > >>>>>>> DelegatingMethodAccessorImpl.java:25) > > >>>>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( > > >>>>>>> WritableRpcEngine.java:312) > > >>>>>>>>>>>> at > > >>>>>>>>>>>> > > >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( > > >>>>>>> HBaseServer.java:1065) > > >>>>>>>>>>>> > > >>>>>>>>>>>> Lars > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> > > >>>>> -- Appy > > >>>>> > > >>>> > > >>> > > >> > > > -- -- Appy
