[ 
https://issues.apache.org/jira/browse/ACCUMULO-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546425#comment-13546425
 ] 

John Vines commented on ACCUMULO-939:
-------------------------------------

Dropped memory map to 3G, tweaked ingest to {code}./accumulo 
org.apache.accumulo.server.test.TestIngest --table test --rows 65536 --cols 100 
--size 1024 -z 172.16.101.220:2181 --batchMemory 10000000 --batchThreads 5 
--createTable {code} (halved batch writers, dropped an order of magnitude from 
batch memory, and dropped size to 1K from 8k. Ran once, no issues. Ran again 
after minor compaction and I had 24 log entries of the similar entries. This is 
the only different one, I think-
{code}Unexpected error writing to log, retrying attempt 1
        org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/accumulo/wal/172.16.101.219+9997/7a529101-6392-46b0-b59b-d3bcc4a72a9d could 
only be replicated to 0 nodes, instead of 1
                at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
                at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
                at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:616)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:416)
                at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
        
                at org.apache.hadoop.ipc.Client.call(Client.java:1070)
                at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
                at $Proxy1.addBlock(Unknown Source)
                at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:616)
                at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
                at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
                at $Proxy1.addBlock(Unknown Source)
                at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
                at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
                at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
                at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829){code}

This had 23 occurrences while the other 23 error messages had 2 each. No 
repetitive {code}Unexpected error writing to log, retrying attempt 1
        java.io.IOException: DFSOutputStream is closed
                at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669)
                at 
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
                at 
org.apache.accumulo.server.tabletserver.log.DfsLogger.defineTablet(DfsLogger.java:295)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger$4.write(TabletServerLogger.java:333)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:273)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
                at 
org.apache.accumulo.server.tabletserver.log.TabletServerLogger.logManyTablets(TabletServerLogger.java:362)
                at 
org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.flush(TabletServer.java:1563)
                at 
org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.closeUpdate(TabletServer.java:1634)
                at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
                at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:616)
                at 
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63)
                at $Proxy2.closeUpdate(Unknown Source)
                at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2018)
                at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(TabletClientService.java:2002)
                at 
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
                at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
                at 
org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:156)
                at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478)
                at org.apache.thrift.server.Invocation.run(Invocation.java:18)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
                at 
org.apache.accumulo.cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
                at org.apache.accumulo.core.util.LoggingRunnable.run(Log{code} 
though. Doing this same ingest with a size of 1536 is enough to break it still. 
Still running hadoop 1.0.4. Ingest DID complete though.
                
> WAL get stuck for unknown reasons
> ---------------------------------
>
>                 Key: ACCUMULO-939
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-939
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0
>         Environment: hadoop-1.0.1
>            Reporter: John Vines
>            Assignee: Eric Newton
>         Attachments: tserver_jstack
>
>
> Attempting to test ACCUMULO-575 with the following test framework:
> Test bench-
> 1 node running hadoop namenode and 1 datanode
> slave noderunning 1 datanode and accumulo stack, with 8GB in memory map
> Running patched version of accumulo with the following aptch to provide 
> helper debug
> {code}Index: 
> server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java
> ===================================================================
> --- 
> server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java   
>     (revision 1429057)
> +++ 
> server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java   
>     (working copy)
> @@ -81,6 +81,7 @@
>    private FileSystem fs;
>    protected KeyExtent extent;
>    private List<IteratorSetting> iterators;
> +  protected boolean minor= false;
>    
>    Compactor(Configuration conf, FileSystem fs, Map<String,DataFileValue> 
> files, InMemoryMap imm, String outputFile, boolean propogateDeletes,
>        TableConfiguration acuTableConf, KeyExtent extent, CompactionEnv env, 
> List<IteratorSetting> iterators) {
> @@ -158,7 +159,7 @@
>          log.error("Verification of successful compaction fails!!! " + extent 
> + " " + outputFile, ex);
>          throw ex;
>        }
> -      
> +      log.info("Just completed minor? " + minor + " for table " + 
> extent.getTableId());
>        log.debug(String.format("Compaction %s %,d read | %,d written | %,6d 
> entries/sec | %6.3f secs", extent, majCStats.getEntriesRead(),
>            majCStats.getEntriesWritten(), (int) (majCStats.getEntriesRead() / 
> ((t2 - t1) / 1000.0)), (t2 - t1) / 1000.0));
>        
> Index: 
> server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
> ===================================================================
> --- 
> server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
>   (revision 1429057)
> +++ 
> server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java
>   (working copy)
> @@ -88,6 +88,7 @@
>      
>      do {
>        try {
> +        this.minor = true;
>          CompactionStats ret = super.call();
>          
>          // log.debug(String.format("MinC %,d recs in | %,d recs out | %,d 
> recs/sec | %6.3f secs | %,d bytes ",map.size(), entriesCompacted,
> {code}
> I stood up a new instance, create a table named test. Ran the following -
> {code}tail -f accumulo-1.5.0-SNAPSHOT/logs/tserver_slave.debug.log | 
> ./ifttt.sh {code}
> where ifttt.sh is
> {code} #!/bin/sh
> dnpid=`jps -m | grep DataNode | awk '{print $1}'`
> while [ -z "" ]; do
>   if [ -e $1 ] ;then read str; else str=$1;fi
>   if [ -n "`echo $str | grep "Just completed minor? true for table 2"`" ]; 
> then
>     echo "I'm gonna kill datanode, pid $dnpid"
>     kill -9 $dnpid
>   fi
> done
> {code}
> Then I ran thefollowing
> {code}accumulo org.apache.accumulo.server.test.TestIngest --table test --rows 
> 65536 --cols 100 --size 8192 -z 172.16.101.220:2181 --batchMemory 100000000 
> --batchThreads 10 {code}
> Eventually the memory map filled, minor compaction happened, local datanode 
> was killed and things died. Logs filled with-
> {code} org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
> /accumulo/wal/172.16.101.219+9997/08b9f1b4-26d5-4b07-a260-3334c2013576 could 
> only be replicated to 0 nodes, instead of 1
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>       at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:416)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> {code}
> and
> {code}
> Unexpected error writing to log, retrying attempt 1
>       java.io.IOException: DFSOutputStream is closed
>               at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3666)
>               at 
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
>               at 
> org.apache.accumulo.server.tabletserver.log.DfsLogger.defineTablet(DfsLogger.java:295)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger$4.write(TabletServerLogger.java:333)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:273)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229)
>               at 
> org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330)
> ... repeats...
> {code}.
> Bringing the datanode back up did NOT fix it, either.
> UPDATE: reran and never killed datanode and it still died. So this isn't an 
> issue with my datanode killing, it's something with hadop 1.0.1 and the new 
> rite ahead logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to