On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <[email protected]>
wrote:

> Revive this thread
>
> I am in the process of removing Region Server side merge (and split)
> transaction code in master branch; as now we have merge (and split)
> procedure(s) from master doing the same thing.
>
>
Good (Issue?)


> The Merge tool depends on RS-side merge code.  I'd like to use this chance
> to remove the util.Merge tool.  This is for 2.0 and up releases only.
> Deprecation does not work here; as keeping the RS-side merge code would
> have duplicate logic in source code and make the new Assignment manager
> code more complicated.
>
>
Could util.Merge be changed to ask the Master run the merge (via AMv2)?

If you remove the util.Merge tool, how then does an operator ask for a
merge in its absence?

Thanks Stephen

S


> Please let me know whether you have objection.
>
> Thanks
> Stephen
>
> PS.  I could deprecated HMerge code if anyone is really using it.  It has
> its own logic and standalone (supposed to dangerously work offline and
> merge more than 2 regions - the util.Merge and shell not support these
> functionality for now).
>
> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <[email protected]>
> wrote:
>
> > @Appy what is not clear from above?
> >
> > I think we should get rid of both Merge and HMerge.
> >
> > We should not have any tool which will work in offline mode by going over
> > the HDFS data. Seems very brittle to be broken when things get changed.
> > Only use case I can think of is that somehow you end up with a lot of
> > regions and you cannot bring the cluster back up because of OOMs, etc and
> > you have to reduce the number of regions in offline mode. However, we did
> > not see this kind of thing in any of our customers for the last couple of
> > years so far.
> >
> > I think we should seriously look into improving normalizer and enabling
> > that by default for all the tables. Ideally, normalizer should be running
> > much more frequently, and should be configured with higher-level goals
> and
> > heuristics. Like on average how many regions per node, etc and should be
> > looking at the global state (like the balancer) to decide on split /
> merge
> > points.
> >
> > Enis
> >
> > On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <[email protected]>
> > wrote:
> >
> > > bq. HMerge can merge multiple regions by going over the list of
> > > regions and checking
> > > their sizes.
> > > bq. But both of these tools (Merge and HMerge) are very dangerous
> > >
> > > I came across HMerge and it looks like dead code. Isn't referenced from
> > > anywhere except one test. (This is what lars also pointed out in the
> > first
> > > email too).
> > > It would make perfect sense if it was a tool or was being referenced
> from
> > > somewhere, but with lack of either of that, am a bit confused here.
> > > @Enis, you seem to know everything about them, please educate me.
> > > Thanks
> > > - Appy
> > >
> > >
> > >
> > > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <[email protected]>
> > > wrote:
> > >
> > > > Merge has very limited usability singe it can do a single merge and
> can
> > > > only run when HBase is offline.
> > > > HMerge can merge multiple regions by going over the list of regions
> and
> > > > checking their sizes.
> > > > And of course we have the "supported" online merge which is the shell
> > > > command.
> > > >
> > > > But both of these tools (Merge and HMerge) are very dangerous I
> think.
> > I
> > > > would say we should deprecate both to be replaced by the online
> merger
> > > > tool. We should not allow offline merge at all. I fail to see the
> > usecase
> > > > that you have to use an offline merge.
> > > >
> > > > Enis
> > > >
> > > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George <[email protected]>
> > > > wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > Sorry to resurrect this old thread, but working on the book
> update, I
> > > > > came across the same today, i.e. we have Merge and HMerge. I tried
> > and
> > > > > Merge works fine now. It is also the only one of the two flagged as
> > > > > being a tool. Should HMerge be removed? At least deprecated?
> > > > >
> > > > > Cheers,
> > > > > Lars
> > > > >
> > > > >
> > > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <[email protected]>
> wrote:
> > > > > >>> there is already an issue to do this but not revamp of these
> > Merge
> > > > > > classes
> > > > > > I guess the issue is HBASE-1621
> > > > > >
> > > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <[email protected]> wrote:
> > > > > >
> > > > > >> Yeah, can you file an issue Lars.  This stuff is ancient and
> needs
> > > to
> > > > > >> be redone AND redone so we can do merging while table is online
> > > (there
> > > > > >> is already an issue to do this but not revamp of these Merge
> > > classes).
> > > > > >>  The unit tests for Merge are also all junit3 and do whacky
> stuff
> > to
> > > > > >> put up multiple regions.  This should be redone too (they are
> > often
> > > > > >> first thing broke when major change and putting them back
> together
> > > is
> > > > > >> a headache since they do not follow the usual pattern).
> > > > > >>
> > > > > >> St.Ack
> > > > > >>
> > > > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> > [email protected]
> > > >
> > > > > >> wrote:
> > > > > >> > Hi Ted,
> > > > > >> >
> > > > > >> > The log is from an earlier attempt, I tried this a few times.
> > This
> > > > is
> > > > > all
> > > > > >> local, after rm'ing the /hbase. So the files are all pretty
> empty,
> > > but
> > > > > since
> > > > > >> I put data in I was assuming it should work. Once you gotten
> into
> > > this
> > > > > >> state, you also get funny error messages in the shell:
> > > > > >> >
> > > > > >> > hbase(main):001:0> list
> > > > > >> > TABLE
> > > > > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > > > > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > > > > >> >
> > > > > >> > ERROR: undefined method `map' for nil:NilClass
> > > > > >> >
> > > > > >> > Here is some help for this command:
> > > > > >> > List all tables in hbase. Optional regular expression
> parameter
> > > > could
> > > > > >> > be used to filter the output. Examples:
> > > > > >> >
> > > > > >> >  hbase> list
> > > > > >> >  hbase> list 'abc.*'
> > > > > >> >
> > > > > >> >
> > > > > >> > hbase(main):002:0>
> > > > > >> >
> > > > > >> > I am assuming this is collateral, but why? The UI works but
> the
> > > > table
> > > > > is
> > > > > >> gone too.
> > > > > >> >
> > > > > >> > Lars
> > > > > >> >
> > > > > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > > > > >> >
> > > > > >> >> There is TestMergeTool which tests Merge.
> > > > > >> >>
> > > > > >> >> From the log you provided, I got a little confused as why
> > > > > >> >> 'testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.'
> > > > > >> didn't
> > > > > >> >> appear in your command line or the output from .META.
> scanning.
> > > > > >> >>
> > > > > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > > > [email protected]>
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >>> Hi,
> > > > > >> >>>
> > > > > >> >>> These two seem both in a bit of a weird state: HMerge is
> > scoped
> > > > > package
> > > > > >> >>> local, therefore no one but the package can call the merge()
> > > > > >> functions...
> > > > > >> >>> and no one does that but the unit test. But it would be good
> > to
> > > > have
> > > > > >> this on
> > > > > >> >>> the CLI and shell as a command (and in the shell maybe with
> a
> > > > > >> confirmation
> > > > > >> >>> message?), but it is not available AFAIK.
> > > > > >> >>>
> > > > > >> >>> HMerge can merge regions of tables that are disabled. It
> also
> > > > merges
> > > > > >> all
> > > > > >> >>> that qualify, i.e. where the merged region is less than or
> > equal
> > > > of
> > > > > >> half the
> > > > > >> >>> configured max file size.
> > > > > >> >>>
> > > > > >> >>> Merge on the other hand does have a main(), so can be
> invoked:
> > > > > >> >>>
> > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > > > > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > > > > >> >>>
> > > > > >> >>> Note how the help insinuates that you can use it as a tool,
> > but
> > > > > that is
> > > > > >> not
> > > > > >> >>> correct. Also, it only merges two given regions, and the
> > cluster
> > > > > must
> > > > > >> be
> > > > > >> >>> shut down (only the HBase daemons). So that is a step back.
> > > > > >> >>>
> > > > > >> >>> What is worse is that I cannot get it to work. I tried in
> the
> > > > shell:
> > > > > >> >>>
> > > > > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
> =>
> > > > > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > > > > >> >>> 0 row(s) in 0.2640 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do
> > put
> > > > > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> > > > > >> >>> 0 row(s) in 1.0450 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):003:0> flush 'testtable'
> > > > > >> >>> 0 row(s) in 0.2000 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > > > ['info:regioninfo']}
> > > > > >> >>> ROW                                  COLUMN+CELL
> > > > > >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
> =>
> > > > > 'row-10'
> > > > > >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-20'
> > > > > >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-30'
> > > > > >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-40'
> > > > > >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-50'
> > > > > >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> > ENDKEY
> > > > =>
> > > > > ''
> > > > > >> >>> 6 row(s) in 0.0440 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):005:0> exit
> > > > > >> >>>
> > > > > >> >>> $ ./bin/stop-hbase.sh
> > > > > >> >>>
> > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > > > > >> >>> testtable,row-20,1309614509041.
> e7c16267eb30e147e5d988c63d40f9
> > > 82.
> > > > \
> > > > > >> >>> testtable,row-30,1309614509041.
> a9cde1cbc7d1a21b1aca2ac7fda30a
> > > d8.
> > > > > >> >>>
> > > > > >> >>> But I get consistently errors:
> > > > > >> >>>
> > > > > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > > > > >> >>> testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.
> > > > > and
> > > > > >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6
> > in
> > > > > table
> > > > > >> >>> testtable
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> > > blocksize=32
> > > > > MB,
> > > > > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> > 1000ms
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > > > > >> >>>
> > > > > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
> 1309616449171/hlog.
> > > > > 1309616449181
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog:
> getNumCurrentReplicas--HDFS-
> > > 826
> > > > > not
> > > > > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > > > > FSDataOutputStream@25961581,
> > > > > >> >>>
> > > > > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > > > > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > tabledescriptor
> > > > > >> >>> config now ...
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > >> -ROOT-,,0.70236052;
> > > > > >> >>> next sequenceid=1
> > > > > >> >>> info: null
> > > > > >> >>> region1: [B@48fd918a
> > > > > >> >>> region2: [B@7f5e2075
> > > > > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > > > > >> >>> java.io.IOException: Could not find meta region for
> > > > > >> >>> testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.
> > > > > >> >>>       at
> > > > > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > > > java:211)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.run(Merge.java:111)
> > > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > > java:65)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.main(Merge.java:386)
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > tabledescriptor
> > > > > >> >>> config now ...
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > >> .META.,,1.1028785192;
> > > > > >> >>> next sequenceid=1
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > > > > -ROOT-,,0.70236052
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > > > > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > > > > >> >>> java.lang.NullPointerException
> > > > > >> >>>       at
> > > > > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > MetaUtils.java:229)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > MetaUtils.java:258)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.run(Merge.java:116)
> > > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > > java:65)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.main(Merge.java:386)
> > > > > >> >>>
> > > > > >> >>> After which I most of the times have shot .META. with an
> error
> > > > > >> >>>
> > > > > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > > > > master.HMaster:
> > > > > >> Failed
> > > > > >> >>> getting all descriptors
> > > > > >> >>> java.io.FileNotFoundException: No status for
> > > > > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > > > > FSUtils.java:888)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > > > > FSTableDescriptors.java:122)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > > > > FSTableDescriptors.java:149)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.master.HMaster.
> > > getHTableDescriptors(HMaster.
> > > > > java:1429)
> > > > > >> >>>       at sun.reflect.NativeMethodAccessorImpl.
> invoke0(Native
> > > > > Method)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > > NativeMethodAccessorImpl.java:39)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > > DelegatingMethodAccessorImpl.java:25)
> > > > > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > > > > WritableRpcEngine.java:312)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > > > > HBaseServer.java:1065)
> > > > > >> >>>
> > > > > >> >>> Lars
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > -- Appy
> > >
> >
>

Reply via email to