[
https://issues.apache.org/jira/browse/HDDS-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620139#comment-16620139
]
Dinesh Chitlangia edited comment on HDDS-507 at 9/19/18 5:44 AM:
-----------------------------------------------------------------
[~xyao], [~anu] Some time last year, I was doing a PoC using RocksDB. I hit
similar issue and after a long hunt we found 2 design issues in the PoC which
potentially caused the issue:
1. When RocksIterator#value was invoked when RocksIterator#isValid was false
(this can unknowingly happen after multiple RocksIterator#next invocations and
not checking the boundary)
2. When RocksIterator#isValid was invoked before RocksIterator#close
Looking at RocksDBStoreIterator#hasNext, there is a strong possibility that we
are landing in situation 2 as described above.
{code:java}
@Override
public boolean hasNext() {
return rocksDBIterator.isValid();
}
{code}
If RocksIterator was closed and we invoked hasNext(), we might hit this issue.
Just thought of running this theory with you all.
P.S. Back then, RocksDB would not indicate if developers would use the API
inappropriately. I haven't worked on it for along time now, not sure what the
current state is.
was (Author: dineshchitlangia):
[~xyao], [~anu] Some time last year, I was doing a PoC using RocksDB. I hit
similar issue and after a long hunt we found 2 design issues in the PoC which
potentially caused the issue:
1. When RocksIterator#value was invoked when RocksIterator#isValid was false
(this can unknowingly happen after multiple RocksIterator#next invocations and
not checking the boundary)
2. When RocksIterator#isValid was invoked before RocksIterator#close
Looking at RocksDBStoreIterator#hasNext, there is a strong possibility that we
are landing in situation 2 as described above.
{code:java}
@Override
public boolean hasNext() {
return rocksDBIterator.isValid();
}
{code}
If RocksIterator was closed and we invoked hasNext(), we might hit this issue.
Just thought of running this theory with you all.
> RocksDB fails with SEGFAULT randomly during PipelineCloseHandler#onMessage
> --------------------------------------------------------------------------
>
> Key: HDDS-507
> URL: https://issues.apache.org/jira/browse/HDDS-507
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Xiaoyu Yao
> Priority: Major
>
> This can be repro-ed by when TestNodeFailure multiple times. Jenkins
> sometimes also hit this.
>
> {code}
> Current thread (0x00007fbe6f018800): JavaThread
> "EventQueue-PipelineCloseForPipelineCloseHandler" daemon [_thread_in_native,
> id=58639, stack(0x0000700018009000,0x0000700018109000)]
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
> 0x000000040000001d
>
>
>
> Stack: [0x0000700018009000,0x0000700018109000], sp=0x0000700018108128, free
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C [librocksdbjni6372054043595793813.jnilib+0x163ac8]
> rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8
> C [librocksdbjni6372054043595793813.jnilib+0x228368]
> rocksdb::DB::Put(rocksdb::WriteOptions const&, rocksdb::ColumnFamilyHandle*,
> rocksdb::Slice const&, rocksdb::Slice const&)+0x58
> C [librocksdbjni6372054043595793813.jnilib+0x2282fe]
> rocksdb::DBImpl::Put(rocksdb::WriteOptions const&,
> rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::Slice
> const&)+0xe
> C [librocksdbjni6372054043595793813.jnilib+0x171c84]
> rocksdb::CompactedDBImpl::Open(rocksdb::Options const&,
> std::__1::basic_string<char, std::__1::char_traits<char>,
> std::__1::allocator<char> > const&, rocksdb::DB**)+0x2a4
> C [librocksdbjni6372054043595793813.jnilib+0x971f7]
> rocksdb_put_helper(JNIEnv_*, rocksdb::DB*, rocksdb::WriteOptions const&,
> rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int, _jbyteArray*, int,
> int)+0x137
> j org.rocksdb.RocksDB.put(JJ[BII[BII)V+0
> j org.rocksdb.RocksDB.put(Lorg/rocksdb/WriteOptions;[B[B)V+17
> j org.apache.hadoop.utils.RocksDBStore.put([B[B)V+10
> j
> org.apache.hadoop.hdds.scm.pipelines.PipelineSelector.updatePipelineState(Lorg/apache/hadoop/hdds/scm/container/common/helpers/Pipeline;Lorg/apache/hadoop/hdds/protocol/proto/HddsProtos$LifeCycleEvent;)V+222
> j
> org.apache.hadoop.hdds.scm.pipelines.PipelineSelector.finalizePipeline(Lorg/apache/hadoop/hdds/scm/container/common/helpers/Pipeline;)V+75
> j
> org.apache.hadoop.hdds.scm.container.ContainerMapping.handlePipelineClose(Lorg/apache/hadoop/hdds/scm/container/common/helpers/PipelineID;)V+18
> j
> org.apache.hadoop.hdds.scm.pipelines.PipelineCloseHandler.onMessage(Lorg/apache/hadoop/hdds/scm/container/common/helpers/PipelineID;Lorg/apache/hadoop/hdds/server/events/EventPublisher;)V+5
> j
> org.apache.hadoop.hdds.scm.pipelines.PipelineCloseHandler.onMessage(Ljava/lang/Object;Lorg/apache/hadoop/hdds/server/events/EventPublisher;)V+6
> J 5844 C1
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(Lorg/apache/hadoop/hdds/server/events/EventHandler;Ljava/lang/Object;Lorg/apache/hadoop/hdds/server/events/EventPublisher;)V
> (41 bytes) @ 0x0000000115c80bc4 [0x0000000115c80aa0+0x124]
> J 5670 C1
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor$$Lambda$143.run()V
> (20 bytes) @ 0x00000001168f625c [0x00000001168f61c0+0x9c]
> j
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95
> J 3226 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @
> 0x0000000116356e44 [0x0000000116356d40+0x104]
> J 3107 C1 java.lang.Thread.run()V (17 bytes) @ 0x0000000115d7b0c4
> [0x0000000115d7af80+0x144]
> v ~StubRoutines::call_stub
> V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*,
> JavaCallArguments*, Thread*)+0x6ae
> V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle,
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle,
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c
> V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b
> V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2
> V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6
> C [libsystem_pthread.dylib+0x3661] _pthread_body+0x154
> C [libsystem_pthread.dylib+0x350d] _pthread_body+0x0
> C [libsystem_pthread.dylib+0x2bf9] thread_start+0xd
> C 0x0000000000000000
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]