[
https://issues.apache.org/jira/browse/ACCUMULO-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544338#comment-13544338
]
John Vines commented on ACCUMULO-575:
-------------------------------------
Test bench-
1 node running hadoop namenode and 1 datanode
slave noderunning 1 datanode and accumulo stack, with 8GB in memory map
Running patched version of accumulo with the following aptch to provide helper
debug
{code}Index:
server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java
===================================================================
--- server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java
(revision 1429057)
+++ server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java
(working copy)
@@ -81,6 +81,7 @@
private FileSystem fs;
protected KeyExtent extent;
private List<IteratorSetting> iterators;
+ protected boolean minor= false;
Compactor(Configuration conf, FileSystem fs, Map<String,DataFileValue>
files, InMemoryMap imm, String outputFile, boolean propogateDeletes,
TableConfiguration acuTableConf, KeyExtent extent, CompactionEnv env,
List<IteratorSetting> iterators) {
@@ -158,7 +159,7 @@
log.error("Verification of successful compaction fails!!! " + extent +
" " + outputFile, ex);
throw ex;
}
-
+ log.info("Just completed minor? " + minor + " for table " +
extent.getTableId());
log.debug(String.format("Compaction %s %,d read | %,d written | %,6d
entries/sec | %6.3f secs", extent, majCStats.getEntriesRead(),
majCStats.getEntriesWritten(), (int) (majCStats.getEntriesRead() /
((t2 - t1) / 1000.0)), (t2 - t1) / 1000.0));
Index:
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
===================================================================
---
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
(revision 1429057)
+++
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
(working copy)
@@ -88,6 +88,7 @@
do {
try {
+ this.minor = true;
CompactionStats ret = super.call();
// log.debug(String.format("MinC %,d recs in | %,d recs out | %,d
recs/sec | %6.3f secs | %,d bytes ",map.size(), entriesCompacted,
{code}
I stood up a new instance, create a table named test. Ran the following -
{code}tail -f accumulo-1.5.0-SNAPSHOT/logs/tserver_slave.debug.log | ./ifttt.sh
{code}
where ifttt.sh is
{code} #!/bin/sh
dnpid=`jps -m | grep DataNode | awk '{print $1}'`
while [ -z "" ]; do
if [ -e $1 ] ;then read str; else str=$1;fi
if [ -n "`echo $str | grep "Just completed minor? true for table 2"`" ]; then
echo "I'm gonna kill datanode, pid $dnpid"
kill -9 $dnpid
fi
done
{code}
Then I ran thefollowing
{code}accumulo org.apache.accumulo.server.test.TestIngest --table test --rows
65536 --cols 100 --size 8192 -z 172.16.101.220:2181 --batchMemory 100000000
--batchThreads 10 {code}
Eventually the memory map filled, minor compaction happened, local datanode was
killed and things died. Unfortunately, I didn't hit the bug I was shooting for.
I'm documenting my testing here so once the wal is fixed I can look into this
more.
> Potential data loss when datanode fails immediately after minor compaction
> --------------------------------------------------------------------------
>
> Key: ACCUMULO-575
> URL: https://issues.apache.org/jira/browse/ACCUMULO-575
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Affects Versions: 1.4.1, 1.4.0
> Reporter: John Vines
> Assignee: John Vines
> Fix For: 1.5.0
>
>
> So this one popped into my head a few days ago, and I've done some research.
> Context-
> 1. In memory map is written to an RFile.
> 2. yadda yadda yadda, FSOutputStream.close() is called.
> 3. close() calls complete() which will not return until the
> dfs.replication.min is reached. dfs.replication.min is by default set to 1 on
> systems and I don't think it's frequently configured
> 4. We read the file to make sure that it was written correctly (this has
> probably been a mitigating factor as to why we haven't run into this
> potential issue)
> 5. We write the file to the !METADATA table
> 6. We write minor compaction to the walog
> If the datanode goes down after 6 but before the file is replicated more,
> then we'll have data loss. The file will be known to the namenode as
> corrupted, but we can't restore it automatically, because the walog has the
> file complete. Step 4 has probably provided enough of a time buffer to
> significantly decrease the possibility of this happening.
> I have not explicitly tested this, but I want to test to validate the
> potential scenario of losing data by dropping a datanode in a multi-node
> system immediately after closing the FSOutputStream. If this is the case,
> then we may want to consider adding a wait between steps 4 and 5 that polls
> the namenode for replication reaching at least the max(2, # nodes).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira