[ 
https://issues.apache.org/jira/browse/HBASE-20679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501341#comment-16501341
 ] 

Allan Yang edited comment on HBASE-20679 at 6/5/18 6:27 AM:
------------------------------------------------------------

{quote}Tell us more about the failure you saw? What did the corruption look 
like? It couldn't read the end of the files?
{quote}
[~stack], Still not figuring out what happened. It definitely had something to 
do with the corrupted hdfs block. Firstly, we saw some procedureWALs's trailer 
could not been read
{code:java}
WARN  [master/hbase-002:16000] wal.WALProcedureStore(1315): Unable to read 
tracker for 
hdfs://emr-cluster/hbase/MasterProcWALs/pv2-00000000000000000136.log - Invalid 
Trailer version. got 4 expected 1
{code}
After reviewed the code, I thought it was normal, since the procedureWAL may 
have been rolled, and the procedure may have been sycned to another wals 
correctly. But, then I saw those lines in the log
{code:java}
ERROR [master/hbase-002:16000] procedure2.ProcedureExecutor(327): Corrupt 
pid=3222, ppid=3189, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=table:kv1, region=3cd7db00f89bab7e477b7773d2df0c99
{code}
Most of the assignProcedure was reported corrupted, and could not be replayed. 
So we ended up in the situation above. Not familiar with ProcedureV2 yet, so we 
have to work around. If there is a fix for procedure wal, that's surely better.


was (Author: allan163):
{quote}
Tell us more about the failure you saw? What did the corruption look like? It 
couldn't read the end of the files?
{quote}

[~stack], Still not figuring out what happened. It definitely had something to 
do with the corrupted hdfs block. Firstly, we saw some procedureWALs's trailer 
could not been read

{code}
2018-05-30 17:39:54,209 WARN  [master/hbase-002:16000] 
wal.WALProcedureStore(1315): Unable to read tracker for 
hdfs://emr-cluster/hbase/MasterProcWALs/pv2-00000000000000000136.log - Invalid 
Trailer version. got 4 expected 1
{code}

After reviewed the code, I thought it was normal, since the procedureWAL may 
have been rolled, and the procedure may have been sycned to another wals 
correctly. But, then I saw those lines in the log

{code}
hbase-hbase-master.log.bak:2018-05-24 12:30:20,695 ERROR 
[master/hbase-002:16000] procedure2.ProcedureExecutor(327): Corrupt pid=3222, 
ppid=3189, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=table:kv1, region=3cd7db00f89bab7e477b7773d2df0c99
{code}

Most of the assignProcedure was reported corrupted, and could not be replayed. 
So we ended up in the situation above. Not familiar with  ProcedureV2 yet, so 
we have to work around. If there is a fix for procedure wal, that's surely 
better.

> Add the ability to compile JSP dynamically in Jetty
> ---------------------------------------------------
>
>                 Key: HBASE-20679
>                 URL: https://issues.apache.org/jira/browse/HBASE-20679
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: HBASE-20679.patch
>
>
> As discussed in HBASE-20617, adding the ability to dynamically compile jsp 
> enable us to do some hot fix. 
>  For example, several days ago, in our testing HBase-2.0 cluster, 
> procedureWals were corrupted due to some unknown reasons. After restarting 
> the cluster, since some procedures(AssignProcedure for example) were 
> corrupted and couldn't be replayed. Some regions were stuck in RIT forever. 
> We couldn't use HBCK since it haven't support AssignmentV2 yet. As a matter 
> of fact, the namespace region was not online, so the master was not inited, 
> we even couldn't use shell command like assign/move. But, we wrote a jsp and 
> fix this issue easily. The jsp file is like this:
> {code:java}
> <%
>   String action = request.getParameter("action");
>   HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
>   List<RegionInfo> offlineRegionsToAssign = new ArrayList<>();
>   List<RegionStates.RegionStateNode> regionRITs = 
> master.getAssignmentManager()
>           .getRegionStates().getRegionsInTransition();
>   for (RegionStates.RegionStateNode regionStateNode :  regionRITs) {
>     // if regionStateNode don't have a procedure attached, but meta state 
> shows
>     // this region is in RIT, that means the previous procedure may be 
> corrupted
>     // we need to create a new assignProcedure to assign them
>     if (!regionStateNode.isInTransition()) {
>       offlineRegionsToAssign.add(regionStateNode.getRegionInfo());
>       out.println("RIT region:" + regionStateNode);
>     }
>   }
>   // Assign offline regions. Uses round-robin.
>   if ("fix".equals(action) && offlineRegionsToAssign.size() > 0) {
>     
> master.getMasterProcedureExecutor().submitProcedures(master.getAssignmentManager().
>             createRoundRobinAssignProcedures(offlineRegionsToAssign));
>   } else {
>     out.println("use ?action=fix to fix RIT regions");
>   }
> %>
> {code}
> Above it is only one example we can do if we have the ability to compile jsp 
> dynamically. We think it is very useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to