[ 
https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430725#comment-15430725
 ] 

Thomas Graves commented on SPARK-16914:
---------------------------------------

that is considered a fatal issue for the nodemanager and its expected to fail.  
hardware goes bad sometimes so you either make sure these paths are durable or 
the nodemanager is going to crash.  not sure what other options you have here



> NodeManager crash when spark are registering executor infomartion into leveldb
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16914
>                 URL: https://issues.apache.org/jira/browse/SPARK-16914
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.6.2
>            Reporter: cen yuhai
>
> {noformat}
> Stack: [0x00007fb5b53de000,0x00007fb5b54df000],  sp=0x00007fb5b54dcba8,  free 
> space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  [libc.so.6+0x896b1]  memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  
> org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j  
> org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j  
> org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j  
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 
> org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V
>  (100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
> J 8371 C2 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
> J 6853 C2 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
> J 5872 C2 
> io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
> J 5849 C2 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
> J 5941 C2 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
>  (170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V 
> (363 bytes) @ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 
> 0x00007fb5f26f6764 [0x00007fb5f26f63c0+0x3a4]
> j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {noformat}
> The target code in spark is in ExternalShuffleBlockResolver
> {code}
>   /** Registers a new Executor with all the configuration we need to find its 
> shuffle files. */
>   public void registerExecutor(
>       String appId,
>       String execId,
>       ExecutorShuffleInfo executorInfo) {
>     AppExecId fullId = new AppExecId(appId, execId);
>     logger.info("Registered executor {} with {}", fullId, executorInfo);
>     try {
>       if (db != null) {
>         byte[] key = dbAppExecKey(fullId);
>         byte[] value =  
> mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
>         db.put(key, value);
>       }
>     } catch (Exception e) {
>       logger.error("Error saving registered executors", e);
>     }
>     executors.put(fullId, executorInfo);
>   }
> {code}
> There is a problem with disk1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to