[jira] [Commented] (ACCUMULO-575) Potential data loss when datanode fails immediately after minor compaction

John Vines (JIRA) Fri, 04 Jan 2013 15:14:15 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544338#comment-13544338
 ]


John Vines commented on ACCUMULO-575:
-------------------------------------

Test bench-
1 node running hadoop namenode and 1 datanode
slave noderunning 1 datanode and accumulo stack, with 8GB in memory map
Running patched version of accumulo with the following aptch to provide helper 
debug
{code}Index: 
server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java
===================================================================
--- server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java 
(revision 1429057)
+++ server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java 
(working copy)
@@ -81,6 +81,7 @@
   private FileSystem fs;
   protected KeyExtent extent;
   private List<IteratorSetting> iterators;
+  protected boolean minor= false;
   
   Compactor(Configuration conf, FileSystem fs, Map<String,DataFileValue> 
files, InMemoryMap imm, String outputFile, boolean propogateDeletes,
       TableConfiguration acuTableConf, KeyExtent extent, CompactionEnv env, 
List<IteratorSetting> iterators) {
@@ -158,7 +159,7 @@
         log.error("Verification of successful compaction fails!!! " + extent + 
" " + outputFile, ex);
         throw ex;
       }
-      
+      log.info("Just completed minor? " + minor + " for table " + 
extent.getTableId());
       log.debug(String.format("Compaction %s %,d read | %,d written | %,6d 
entries/sec | %6.3f secs", extent, majCStats.getEntriesRead(),
           majCStats.getEntriesWritten(), (int) (majCStats.getEntriesRead() / 
((t2 - t1) / 1000.0)), (t2 - t1) / 1000.0));
       
Index: 
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
===================================================================
--- 
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
    (revision 1429057)
+++ 
server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
    (working copy)
@@ -88,6 +88,7 @@
     
     do {
       try {
+        this.minor = true;
         CompactionStats ret = super.call();
         
         // log.debug(String.format("MinC %,d recs in | %,d recs out | %,d 
recs/sec | %6.3f secs | %,d bytes ",map.size(), entriesCompacted,
{code}

I stood up a new instance, create a table named test. Ran the following -
{code}tail -f accumulo-1.5.0-SNAPSHOT/logs/tserver_slave.debug.log | ./ifttt.sh 
{code}
where ifttt.sh is
{code} #!/bin/sh

dnpid=`jps -m | grep DataNode | awk '{print $1}'`

while [ -z "" ]; do
  if [ -e $1 ] ;then read str; else str=$1;fi
  if [ -n "`echo $str | grep "Just completed minor? true for table 2"`" ]; then
    echo "I'm gonna kill datanode, pid $dnpid"
    kill -9 $dnpid
  fi
done
{code}

Then I ran thefollowing
{code}accumulo org.apache.accumulo.server.test.TestIngest --table test --rows 
65536 --cols 100 --size 8192 -z 172.16.101.220:2181 --batchMemory 100000000 
--batchThreads 10 {code}

Eventually the memory map filled, minor compaction happened, local datanode was 
killed and things died. Unfortunately, I didn't hit the bug I was shooting for. 
I'm documenting my testing here so once the wal is fixed I can look into this 
more.

                
> Potential data loss when datanode fails immediately after minor compaction
> --------------------------------------------------------------------------
>
>                 Key: ACCUMULO-575
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-575
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.4.1, 1.4.0
>            Reporter: John Vines
>            Assignee: John Vines
>             Fix For: 1.5.0
>
>
> So this one popped into my head a few days ago, and I've done some research.
> Context-
> 1. In memory map is written to an RFile.
> 2. yadda yadda yadda, FSOutputStream.close() is called.
> 3. close() calls complete() which will not return until the 
> dfs.replication.min is reached. dfs.replication.min is by default set to 1 on 
> systems and I don't think it's frequently configured
> 4. We read the file to make sure that it was written correctly (this has 
> probably been a mitigating factor as to why we haven't run into this 
> potential issue)
> 5. We write the file to the !METADATA table
> 6. We write minor compaction to the walog
> If the datanode goes down after 6 but before the file is replicated more, 
> then we'll have data loss. The file will be known to the namenode as 
> corrupted, but we can't restore it automatically, because the walog has the 
> file complete. Step 4 has probably provided enough of a time buffer to 
> significantly decrease the possibility of this happening.
> I have not explicitly tested this, but I want to test to validate the 
> potential scenario of losing data by dropping a datanode in a multi-node 
> system immediately after closing the FSOutputStream. If this is the case, 
> then we may want to consider adding a wait between steps 4 and 5 that polls 
> the namenode for replication reaching at least the max(2, # nodes).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-575) Potential data loss when datanode fails immediately after minor compaction

Reply via email to