[
https://issues.apache.org/jira/browse/ACCUMULO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627333#comment-13627333
]
Hudson commented on ACCUMULO-1243:
----------------------------------
Integrated in Accumulo-Trunk #824 (See
[https://builds.apache.org/job/Accumulo-Trunk/824/])
ACCUMULO-1243 Added apache header (Revision 1466244)
ACCUMULO-1243 Made tablet loading code only load one tablet when recovering a
split. Made code more strict, it will only load an exact tablet. Made load
code roll back splits that have started, but did not create a new tablet.
Added some more test. (Revision 1466217)
Result = SUCCESS
kturner :
Files :
* /accumulo/trunk
* /accumulo/trunk/assemble
* /accumulo/trunk/core
* /accumulo/trunk/examples
* /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java
*
/accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java
* /accumulo/trunk/server
*
/accumulo/trunk/server/src/test/java/org/apache/accumulo/server/tabletserver/CheckTabletMetadataTest.java
* /accumulo/trunk/src
kturner :
Files :
* /accumulo/trunk
* /accumulo/trunk/assemble
* /accumulo/trunk/core
*
/accumulo/trunk/core/src/main/java/org/apache/accumulo/core/util/MetadataTable.java
* /accumulo/trunk/examples
* /accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/ZooStore.java
*
/accumulo/trunk/fate/src/main/java/org/apache/accumulo/fate/zookeeper/ZooSession.java
* /accumulo/trunk/server
*
/accumulo/trunk/server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
*
/accumulo/trunk/server/src/main/java/org/apache/accumulo/server/util/MetadataTable.java
*
/accumulo/trunk/server/src/test/java/org/apache/accumulo/server/tabletserver/CheckTabletMetadataTest.java
* /accumulo/trunk/src
*
/accumulo/trunk/test/src/main/java/org/apache/accumulo/test/functional/SplitRecoveryTest.java
*
/accumulo/trunk/test/src/main/java/org/apache/accumulo/test/performance/scan/CollectTabletStats.java
*
/accumulo/trunk/test/src/test/java/org/apache/accumulo/test/TestAccumulo1235.java
*
/accumulo/trunk/test/src/test/java/org/apache/accumulo/test/TestAccumuloSplitRecovery.java
> Multiple assignment may occur if tablet server dies during split
> ----------------------------------------------------------------
>
> Key: ACCUMULO-1243
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1243
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.4.0
> Reporter: Keith Turner
> Assignee: Keith Turner
> Priority: Critical
> Fix For: 1.5.0, 1.4.4
>
>
> Make the following change to the tablet server code. The tablet server has
> to die at this exact point for the bug to occur.
> {noformat}
> Index: src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> ===================================================================
> --- src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> (revision 1464780)
> +++ src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> (working copy)
> @@ -3592,6 +3592,9 @@
> MetadataTable.splitTablet(high, extent.getPrevEndRow(), splitRatio,
> SecurityConstants.getSystemCredentials(), tabletServer.getLock());
> MetadataTable.addNewTablet(low, lowDirectory,
> tabletServer.getTabletSession(), lowDatafileSizes, bulkLoadedFiles,
> SecurityConstants.getSystemCredentials(), time, lastFlushID,
> lastCompactID, tabletServer.getLock());
> +
> + Runtime.getRuntime().halt(2);
> +
> MetadataTable.finishSplit(high, highDatafileSizes,
> highDatafilesToRemove, SecurityConstants.getSystemCredentials(),
> tabletServer.getLock());
>
> log.log(TLevel.TABLET_HIST, extent + " split " + low + " " + high);
> {noformat}
> Then create a table and add a split.
> {noformat}
> root@test15> createtable foo
> root@test15 foo> addsplits -t foo m
> {noformat}
> If there are multiple tablet servers, then its possible that multiple
> assignment may occur. Below is an example of this occurring after tablets
> were loaded.
> {noformat}
> root@test15 !METADATA> scan -b 1 -c loc
> 1;m loc:13d5a86463f4f98 [] 127.0.0.1:9998
> 1;m loc:13d5a86463f4f9f [] 127.0.0.1:10000
> 1< loc:13d5a86463f4f98 [] 127.0.0.1:9998
> {noformat}
> The problem is that the assignment code in the tserver detects an incomplete
> split and load both children. However, the master may also assign one of
> the children.
> I think the assignment code should be modified to fix up the metadata table
> and only load one tablet. If the new tablet was not created, it should roll
> back the changes and load the pre split tablets. If the new tablet was
> created, then assume the master will assign it and only load the high tablet.
> I think these changes would greatly simplify the code also.
> I do not think the proposed changes would cause issues with merge, since the
> chop flag is deleted in the case where this occurs.
> Need to ensure that the solution is itself fault tolerant.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira