[
https://issues.apache.org/jira/browse/HBASE-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147481#comment-13147481
]
stack commented on HBASE-4288:
------------------------------
Oops... let me reattach the change description:
{code}
M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
Fix javadoc warnings. Change some logging that could be profuse
from DEBUG to TRACE level (now it works).
(setMetaLocation) Changed from private to package protected so could
use in tests.
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
(constructAndStartCatalogTracker): Removed. Has null Connection so NPEs.
(testInterruptWaitOnMetaAndRoot): Now we use HTables instead
of server interfaces, we need to do a bit more work getting the right
HConnection into place. Use new mockConnection method. Make sure we
cleanup the connection when we are done.
(testServerNotRunningIOException): Added test to prove hbase-4288
is fixed. This is the important addition here.
(testGetMetaServerConnectionFails): Refactored to use new mockConnection
utility methods.
(mockConnection): utility mocking up a connection.
(getMetaTableRowResult): New utility method that fakes a Result.
{code}
Here is the proof. Here's logs from the test:
{code}
2011-11-09 19:50:29,433 INFO [main] catalog.CatalogTracker(597): Failed
verification of .META.,,1 at address=example.org,1234,1320897021185;
java.io.IOException: Server not running, aborting
2011-11-09 19:50:29,433 DEBUG [main] catalog.CatalogTracker(492): Current
cached META location, example.org,1234,1320897021185, is not valid, resetting
2011-11-09 19:50:29,438 DEBUG [main] catalog.CatalogTracker(504): Set new
cached META location: example.org,1234,1320897021185
{code}
See how now when we get an IOE 'Server not running', it doesn't kill us --
instead, we have the .META. location as bad and we clear the bad value (In the
third log line, we found a good value and set the new meta location).
> "Server not running" exception during meta verification causes RS abort
> -----------------------------------------------------------------------
>
> Key: HBASE-4288
> URL: https://issues.apache.org/jira/browse/HBASE-4288
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.4
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0, 0.90.5
>
> Attachments: 4288-v2.txt, 4288.txt
>
>
> The master tried to verify the META location just as that server was shutting
> down due to an abort. This caused the "Server not running" exception to get
> thrown, which wasn't handled properly in the master, causing it to abort.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira