[
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451876#comment-13451876
]
rajeshbabu commented on HBASE-6438:
-----------------------------------
@Ted
When I ran test suite in my local below test cases are always(without this
patch also) failing because of environment problems. I ran failed tests
individually in our jenkins multiple times. They are always passing.
{code}
Failed tests: testPermMask(org.apache.hadoop.hbase.util.TestFSUtils):
expected:<rwx------> but was:<rwxrwxrwx>
Tests in error:
testCacheOnWriteInSchema[1](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema):
Target HLog directory already exists:
/mnt/F/hbase94Com/target/test-data/8a5bb561-edfc-4fab-9358-7ab726cb44fc/TestCacheOnWriteInSchema/logs
testCacheOnWriteInSchema[2](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema):
Target HLog directory already exists:
/mnt/F/hbase94Com/target/test-data/8a5bb561-edfc-4fab-9358-7ab726cb44fc/TestCacheOnWriteInSchema/logs
testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction):
Failed delete of
/mnt/F/hbase94Com/target/test-data/9d7234b4-1f6a-42a7-bbb1-641eb464b7e6/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/4bbe087ebab2243b8b9633bb3d870f4c
testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction):
Failed delete of
/mnt/F/hbase94Com/target/test-data/4afca7c8-ee29-47fb-b660-f2ee661bced7/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/ad08ee3070175df954844582816d5927
testOffPeakCompactionRatio(org.apache.hadoop.hbase.regionserver.TestCompactSelection):
Target HLog directory already exists:
/mnt/F/hbase94Com/target/test-data/dd6ca8f4-4321-42d8-825b-fc6a42ab84c0/TestCompactSelection/logs
Tests run: 1590, Failures: 1, Errors: 5, Skipped: 12
Running org.apache.hadoop.hbase.regionserver.TestSplitTransaction
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.383 sec
Results :
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
Running org.apache.hadoop.hbase.regionserver.TestCompactSelection
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.264 sec
Results :
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
Running org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.564 sec
Results :
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
Running org.apache.hadoop.hbase.util.TestFSUtils
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.43 sec
Results :
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
{code}
> RegionAlreadyInTransitionException needs to give more info to avoid
> assignment inconsistencies
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-6438
> URL: https://issues.apache.org/jira/browse/HBASE-6438
> Project: HBase
> Issue Type: Bug
> Reporter: ramkrishna.s.vasudevan
> Assignee: rajeshbabu
> Fix For: 0.96.0, 0.92.3, 0.94.2
>
> Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch,
> HBASE-6438_trunk.patch
>
>
> Seeing some of the recent issues in region assignment,
> RegionAlreadyInTransitionException is one reason after which the region
> assignment may or may not happen(in the sense we need to wait for the TM to
> assign).
> In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on
> master restart.
> Consider the following case, due to some reason like master restart or
> external assign call, we try to assign a region that is already getting
> opened in a RS.
> Now the next call to assign has already changed the state of the znode and so
> the current assign that is going on the RS is affected and it fails. The
> second assignment that started also fails getting RAITE exception. Finally
> both assignments not carrying on. Idea is to find whether any such RAITE
> exception can be retried or not.
> Here again we have following cases like where
> -> The znode is yet to transitioned from OFFLINE to OPENING in RS
> -> RS may be in the step of openRegion.
> -> RS may be trying to transition OPENING to OPENED.
> -> RS is yet to add to online regions in the RS side.
> Here in openRegion() and updateMeta() any failures we are moving the znode to
> FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other
> cases the assignment is stopped.
> The idea is to just add the current state of the region assignment in the RIT
> map in the RS side and using that info we can determine whether the
> assignment can be retried or not on getting an RAITE.
> Considering the current work going on in AM, pls do share if this is needed
> atleast in the 0.92/0.94 versions?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira