[jira] [Commented] (HBASE-8049) If a RS cannot use a compression codec, it should have a retry limit on checking results of CompressionTest

ramkrishna.s.vasudevan (JIRA) Sat, 09 Mar 2013 01:19:16 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597888#comment-13597888
 ]


ramkrishna.s.vasudevan commented on HBASE-8049:
-----------------------------------------------

Whenever RS tries to open the region from OpenRegionHandler, 
MAke the zk state to FATAL or UNRECOVERABLE

On the master side add the regions under this znode to a special datastrucutre 
with the current RS on which it failed.
HAve a timer thread which acts on these regions with different region plan so 
that it can be tried on another RS.

-> Now if the master finds an RS with the compression codec available the 
Region gets opened there.
This may make all the regions to move to this RS as it is the expected RS with 
compression.  So once the RS are rebooted with compression, automatically the 
regions will be assigned and balanced

-> Now what if none of the RS has compression codec
Then we should be continuously retry the process and keep logging that the RS 
is not enabled with the expected compression.

Create Table:
If within the configured time if create table does not succeed then the client 
will get an error.  So once the reboot of the RS(after fixing the compression) 
is done we would be able to carry on with opening the regions.

Enable Table:
When the problem happens when we try to ENABLE a table, we should ensure that 
the table is forcefully ENABLED after the entire regions are assigned.

During this time the table is not usable.  
                
> If a RS cannot use a compression codec, it should have a retry limit on 
> checking results of CompressionTest
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8049
>                 URL: https://issues.apache.org/jira/browse/HBASE-8049
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.6, 0.92.3, 0.95.0, 0.94.7
>         Environment: Including, but not limited to, Centos6_64
>            Reporter: Aleksandr Shulman
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.95.0, 0.94.7
>
>
> Observed Behavior:
> When a user attempts to create a table but there is an issue with the codec, 
> the attempt continues repeatedly. The shell command times out but the RS and 
> Master are both occupied, leading to HBase being down. Further, HBase creates 
> the folders for the table in HDFS.
> The only way to restore the service is by disabling and dropping the table.
> Here are the log lines when a table, t8, is created with this definition:
> create 't8', {NAME=>'f1',COMPRESSION=>'lzo'}
> Error from shell:
> hbase(main):003:0> create 't8', {NAME=>'f1',BLOOMFILTER=>'row', 
> COMPRESSION=>'lzo'}
> ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 
> regions are online; retries exhausted.
> Log lines on Master (repeats a few times/second):
> 2013-03-07 22:55:31,389 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region t8,,1362725678436.311edabcc1fe52001cb00e7c3e7f75d4.; 
> plan=hri=t8,,1362725678436.311edabcc1fe52001cb00e7c3e7f75d4., src=, 
> dest=upgrade-vm-1.ent.cloudera.com,60020,1362709586485
> 2013-03-07 22:55:31,389 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> t8,,1362725678436.311edabcc1fe52001cb00e7c3e7f75d4. to 
> upgrade-vm-1.ent.cloudera.com,60020,1362709586485
> 2013-03-07 22:55:31,398 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, 
> server=upgrade-vm-1.ent.cloudera.com,60020,1362709586485, 
> region=311edabcc1fe52001cb00e7c3e7f75d4
> 2013-03-07 22:55:31,406 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_FAILED_OPEN, 
> server=upgrade-vm-1.ent.cloudera.com,60020,1362709586485, 
> region=311edabcc1fe52001cb00e7c3e7f75d4
> 2013-03-07 22:55:31,406 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 311edabcc1fe52001cb00e7c3e7f75d4
> 2013-03-07 22:55:31,406 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=t8,,1362725678436.311edabcc1fe52001cb00e7c3e7f75d4. state=CLOSED, 
> ts=1362725731398, server=upgrade-vm-1.ent.cloudera.com,60020,1362709586485
> 2013-03-07 22:55:31,406 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x13d47d214830000 Creating (or updating) unassigned node for 
> 311edabcc1fe52001cb00e7c3e7f75d4 with OFFLINE state
> 2013-03-07 22:55:31,414 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=upgrade-vm-1.ent.cloudera.com:60000, 
> region=311edabcc1fe52001cb00e7c3e7f75d4
> Log lines on RS (repeats a few times/second):
> 2013-03-07 22:58:23,323 ERROR 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open 
> of region=t8,,1362725678436.311edabcc1fe52001cb00e7c3e7f75d4.
> java.io.IOException: Compression algorithm 'lzo' previously failed test.
> at 
> org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2797)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2786)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2774)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:319)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:105)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:163)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Expected behavior:
> We expect to fail fast (after a few retries). This should take <1 sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8049) If a RS cannot use a compression codec, it should have a retry limit on checking results of CompressionTest

Reply via email to