[
https://issues.apache.org/jira/browse/ACCUMULO-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882897#comment-13882897
]
Eric Newton commented on ACCUMULO-2261:
---------------------------------------
Oh,. you're right. Do you have the end of the log for Server A? And, are the
times across your system synchronized? I'm trying to establish how this
happened, to see if there's another check we can add to mitigate it.
> duplicate locations
> -------------------
>
> Key: ACCUMULO-2261
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2261
> Project: Accumulo
> Issue Type: Bug
> Components: master, tserver
> Affects Versions: 1.5.0
> Environment: hadoop 2.2.0 and zookeeper 3.4.5
> Reporter: Eric Newton
> Assignee: Eric Newton
> Priority: Blocker
> Fix For: 1.5.1
>
>
> Anthony F reports the following:
> bq. I have observed a loss of data when tservers fail during bulk ingest.
> The keys that are missing are right around the table's splits indicating that
> data was lost when a tserver died during a split. I am using Accumulo 1.5.0.
> At around the same time, I observe the master logging a message about "Found
> two locations for the same extent".
> And:
> bq. I'm currently digging through the logs and will report back. Keep in
> mind, I'm using Accumulo 1.5.0 on a Hadoop 2.2.0 stack. To determine data
> loss, I have a 'ConsistencyCheckingIterator' that verifies each row has the
> expected data (it takes a long time to scan the whole table). Below is a
> quick summary of what happened. The tablet in question is "d;72~gcm~201304".
> Notice that it is assigned to 192.168.2.233:9997[343bc1fa155242c] at
> 2014-01-25 09:49:36,233. At 2014-01-25 09:49:54,141, the tserver goes away.
> Then, the tablet gets assigned to 192.168.2.223:9997[143bc1f14412432] and
> shortly after that, I see the BadLocationStateException. The master never
> recovers from the BLSE - I have to manually delete one of the offending
> locations.
> {noformat}
> 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning
> tablet d;72~gcm~201304;72=192.168.2.233:9997[343bc1fa155242c]
> 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning
> tablet p;18~thm~2012101;18=192.168.2.233:9997[343bc1fa155242c]
> 2014-01-25 09:49:54,141 [master.Master] WARN : Lost servers
> [192.168.2.233:9997[343bc1fa155242c]]
> 2014-01-25 09:49:56,866 [master.Master] DEBUG: 42 assigned to dead servers:
> [d;03~u36~201302;03~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],null),
>
> d;06~u36~2013;06~thm~2012083@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;25;24~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;25~u36~201303;25~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;27~gcm~2013041;27@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;30~u36~2013031;30~thm~2012082@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;34~thm;34~gcm~2013022@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;39~thm~20121;39~gcm~20130418@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;41~thm;41~gcm~2013041@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;42~u36~201304;42~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;45~thm~201208;45~gcm~201303@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;48~gcm~2013052;48@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;60~u36~2013021;60~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;68~gcm~2013041;68@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;72;71~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;72~gcm~201304;72@(192.168.2.233:9997[343bc1fa155242c],null,null),
> d;75~thm~2012101;75~gcm~2013032@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;78;77~u36~201305@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;90~u36~2013032;90~thm~2012092@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;91~thm;91~gcm~201304@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;93~u36~2013012;93~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;20;19@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;38;37@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;51;50@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;60;59@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;92;91@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;01<@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;04;03@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;50;49@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;63;62@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;74;73@(null,192.168.2.233:9997[343bc1fa155242c],null),
> o;97;96@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;08~thm~20121;08@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;09~thm~20121;09@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;10;09~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;18~thm~2012101;18@(192.168.2.233:9997[343bc1fa155242c],null,null),
> p;21;20~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;22~thm~2012091;22@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;23;22~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;41~thm~2012111;41@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;42;41~thm~2012111@(null,192.168.2.233:9997[343bc1fa155242c],null),
> p;58~thm~201208;58@(null,192.168.2.233:9997[343bc1fa155242c],null)]...
> 2014-01-25 09:49:59,706 [master.Master] DEBUG: Normal Tablets assigning
> tablet d;72~gcm~201304;72=192.168.2.223:9997[143bc1f14412432]
> 2014-01-25 09:50:13,515 [master.EventCoordinator] INFO : tablet
> d;72~gcm~201304;72 was loaded on 192.168.2.223:9997
> 2014-01-25 09:51:20,058 [state.MetaDataTableScanner] ERROR:
> java.lang.RuntimeException:
> org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException:
> found two locations for the same extent d;72~gcm~201304:
> 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233:9997[343bc1fa155242c]
> java.lang.RuntimeException:
> org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException:
> found two locations for the same extent d;72~gcm~201304:
> 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233:9997[343bc1fa155242c]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)