Re: Region in RIT (CLOSING) , How to fix it ?

Wellington Chevreuil Thu, 05 Sep 2019 04:06:09 -0700

RS restart on its own wouldn't solve the RIT issue. What Ankit mentioned is
that there might still be data for those regions on the memory cache from
RS that was hosting it. That would require a RS restart to get flushed. You
may still want to track on the blocked procedures and the one holding the
lock, as it may give you further problems later if you need those regions
open again (for instance, if this is a disabled table that might be enabled
again).


Em qui, 5 de set de 2019 às 03:32, Syni Guo <[email protected]>
escreveu:

>
> HI Ankit ，
>
> But I try to restart the RS several times , it’s does NOT to auto fix the
> RIT region .
>
>
> > On 5 Sep 2019, at 10:00, Ankit Singhal <[email protected]> wrote:
> >
> > The only problem with this approach is that if RS responsible for closing
> > the region did not
> > flush the data properly then explicitly setting it to CLOSED in
> hbase:meta
> > can result in data loss until
> > you killed the RS so that the data can be replayed through WAL.
> >
> > Thanks,
> > Ankit Singhal
> >
> > On Wed, Sep 4, 2019 at 6:15 PM Syni Guo <[email protected]>
> wrote:
> >
> >>
> >> HI ,
> >>
> >> Thanks for your suggection  ,  I used a extreme method solved the issue
> .
> >>
> >> 1. Change the RIT region 'info:state’ value to 'CLOSED' of the meta
> table
> >> ‘hbase:mata’
> >> 2. Switch over the master node .
> >>
> >>
> >>> On 4 Sep 2019, at 00:05, Wellington Chevreuil <
> >> [email protected]> wrote:
> >>>
> >>> The "unassign" procs for each of those regions are waiting on a lock
> >> that's
> >>> currently held by another proc whose id is 21075720:
> >>>>
> >>>> 2019-09-02 19:51:29,365 INFO  [PEWorker-1]
> >>>> procedure.MasterProcedureScheduler: Waiting on xlock for
> >>>> pid=21097026...held by pid=21075720
> >>>>
> >>>
> >>> You need to figure out why this pid=21075720 is stuck while it keeps
> the
> >>> lock(s) needed by the two "unassign" procs. You can figure out which
> proc
> >>> is this via Web UI or hbase shell list_procedures command. From past
> >>> experience, I would bet this is a rogue "disable_table" command who had
> >>> timed out long ago. A meta scan for this
> 'alpha_daas:device_data_details'
> >>> table regions would also help determine if this indeed the case:
> >>> $ echo "scan 'hbase:meta'" | hbase shell | grep
> >>> "alpha_daas:device_data_details"
> >>>
> >>> Em ter, 3 de set de 2019 às 00:12, Syni Guo <[email protected]>
> >>> escreveu:
> >>>
> >>>> Hi Ankit,
> >>>>
> >>>> How can i find out the block session or process and kill them. it does
> >> not
> >>>> work even reboot the RS
> >>>>
> >>>>
> >>>> 获取 Outlook for Android<https://aka.ms/ghei36>
> >>>>
> >>>> ________________________________
> >>>> From: Ankit Singhal <[email protected]>
> >>>> Sent: Monday, September 2, 2019 11:47:06 PM
> >>>> To: [email protected] <[email protected]>
> >>>> Subject: Re: Region in RIT (CLOSING) , How to fix it ?
> >>>>
> >>>> you may check the RS logs of tx-220-70-27.h.chinabank.com.cn for
> >>>> exceptions(like sync failure
> >>>> while writing marker in the WAL during closure etc ) which are keeping
> >>>> these regions stuck in the
> >>>> CLOSING state and you may also try killing this server for
> >>>> ServerCrashProcedure to take care of this.
> >>>>
> >>>> Regards,
> >>>> Ankit Singhal
> >>>>
> >>>> On Mon, Sep 2, 2019 at 6:52 AM Jean-Marc Spaggiari <
> >>>> [email protected]>
> >>>> wrote:
> >>>>
> >>>>> Hi Syni,
> >>>>>
> >>>>> Have you tried using HBCK2?
> >>>>>
> >>>>> JMS
> >>>>>
> >>>>> Le lun. 2 sept. 2019 07 h 57, Syni Guo <[email protected]> a
> >>>> écrit
> >>>>> :
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hbase version : 2.1.3
> >>>>>>
> >>>>>>
> >>>>>> There are 2 region in RIT (CLOSING) , How to fix it ? , I try to
> >>>> unassign
> >>>>>> it ,but timeout failed .
> >>>>>>
> >>>>>>
> >>>>>> hbase(main):032:0> unassign '444405785869685e6ec948c03c2076b8',true
> >>>>>>
> >>>>>> ERROR: Call id=13259, waitTime=10009, rpcTimeout=10000
> >>>>>>
> >>>>>> For usage try 'help "unassign”'
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Logs;
> >>>>>>
> >>>>>> 2019-09-02 19:50:54,453 WARN  [ProcExecTimeout]
> >>>>>> assignment.AssignmentManager: STUCK Region-In-Transition
> rit=CLOSING,
> >>>>>> location=tx-220-70-27.h.chinabank.com.cn,60020,1567410218074,
> >>>>>> table=alpha_daas:device_data_details,
> >>>>>> region=444405785869685e6ec948c03c2076b8
> >>>>>> 2019-09-02 19:50:54,453 WARN  [ProcExecTimeout]
> >>>>>> assignment.AssignmentManager: STUCK Region-In-Transition
> rit=CLOSING,
> >>>>>> location=tx-220-70-27.h.chinabank.com.cn,60020,1567410218074,
> >>>>>> table=alpha_daas:poi_unicom_stat,
> >>>> region=42de1052551760e45cb7ba2684d586f8
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2019-09-02 19:51:29,365 INFO  [PEWorker-1]
> >>>>>> procedure.MasterProcedureScheduler: Waiting on xlock for
> pid=21097026,
> >>>>>> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
> >>>>>> table=alpha_daas:device_data_details,
> >>>>>> region=444405785869685e6ec948c03c2076b8, server=
> >>>>>> tx-220-70-27.h.chinabank.com.cn,60020,1567410218074 held by
> >>>> pid=21075720
> >>>>>> 2019-09-02 19:51:39,581 INFO  [PEWorker-11]
> >>>>>> procedure.MasterProcedureScheduler: Waiting on xlock for
> pid=21097027,
> >>>>>> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
> >>>>>> table=alpha_daas:device_data_details,
> >>>>>> region=444405785869685e6ec948c03c2076b8, server=
> >>>>>> tx-220-70-27.h.chinabank.com.cn,60020,1567410218074 held by
> >>>> pid=21075720
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Region in RIT (CLOSING) , How to fix it ?

Reply via email to