[
https://issues.apache.org/jira/browse/ACCUMULO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638638#comment-13638638
]
Hudson commented on ACCUMULO-1243:
----------------------------------
Integrated in Accumulo-1.5-Hadoop-2.0 #86 (See
[https://builds.apache.org/job/Accumulo-1.5-Hadoop-2.0/86/])
ACCUMULO-1243 made accumulo more responsive to failed splits (Revision
1470734)
Result = FAILURE
kturner :
Files :
*
/accumulo/branches/1.5/server/src/main/java/org/apache/accumulo/server/tabletserver/TabletServer.java
* /accumulo/branches/1.5/test/src/test/resources/log4j.properties
> Multiple assignment may occur if tablet server dies during split
> ----------------------------------------------------------------
>
> Key: ACCUMULO-1243
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1243
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.4.0
> Reporter: Keith Turner
> Assignee: Keith Turner
> Priority: Critical
> Fix For: 1.5.0
>
>
> Make the following change to the tablet server code. The tablet server has
> to die at this exact point for the bug to occur.
> {noformat}
> Index: src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> ===================================================================
> --- src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> (revision 1464780)
> +++ src/main/java/org/apache/accumulo/server/tabletserver/Tablet.java
> (working copy)
> @@ -3592,6 +3592,9 @@
> MetadataTable.splitTablet(high, extent.getPrevEndRow(), splitRatio,
> SecurityConstants.getSystemCredentials(), tabletServer.getLock());
> MetadataTable.addNewTablet(low, lowDirectory,
> tabletServer.getTabletSession(), lowDatafileSizes, bulkLoadedFiles,
> SecurityConstants.getSystemCredentials(), time, lastFlushID,
> lastCompactID, tabletServer.getLock());
> +
> + Runtime.getRuntime().halt(2);
> +
> MetadataTable.finishSplit(high, highDatafileSizes,
> highDatafilesToRemove, SecurityConstants.getSystemCredentials(),
> tabletServer.getLock());
>
> log.log(TLevel.TABLET_HIST, extent + " split " + low + " " + high);
> {noformat}
> Then create a table and add a split.
> {noformat}
> root@test15> createtable foo
> root@test15 foo> addsplits -t foo m
> {noformat}
> If there are multiple tablet servers, then its possible that multiple
> assignment may occur. Below is an example of this occurring after tablets
> were loaded.
> {noformat}
> root@test15 !METADATA> scan -b 1 -c loc
> 1;m loc:13d5a86463f4f98 [] 127.0.0.1:9998
> 1;m loc:13d5a86463f4f9f [] 127.0.0.1:10000
> 1< loc:13d5a86463f4f98 [] 127.0.0.1:9998
> {noformat}
> The problem is that the assignment code in the tserver detects an incomplete
> split and load both children. However, the master may also assign one of
> the children.
> I think the assignment code should be modified to fix up the metadata table
> and only load one tablet. If the new tablet was not created, it should roll
> back the changes and load the pre split tablets. If the new tablet was
> created, then assume the master will assign it and only load the high tablet.
> I think these changes would greatly simplify the code also.
> I do not think the proposed changes would cause issues with merge, since the
> chop flag is deleted in the case where this occurs.
> Need to ensure that the solution is itself fault tolerant.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira