[ 
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410343#comment-15410343
 ] 

Dinesh Bhat commented on KUDU-1500:
-----------------------------------

[~tlipcon] [~mpercy] [[email protected]], please see the history of 
discussions on KUDU-1264 as well. Our initial approach was to take the least 
invasive path - i.e., copying the required attributes(under lock) in the read 
path which races with metadata resurrection. Review is posted under 
http://gerrit.cloudera.org:8080/3823

Approach 1(current): Guard the read path with same lock taken by resurrection 
path. However, this kinda assumes that accessors in hot path like 
Tablet::CheckRowInTablet are not active because we had quiesced the tablet 
before during the 'delete after corruption' path. Although it seems like safe 
assumption, it may open up room for future worms. 

Approach 2: Now I am thinking why can't we take the same approach as in adding 
a new tablet replica approach ? i.e since the tablet is in NOT_RUNNING state, 
we need not serve ListTablets RPC for this tablet alone. I am not sure about 
the consequence of this filtering on the cluster.


> TSAN race in ListTablets vs tablet metadata loading
> ---------------------------------------------------
>
>                 Key: KUDU-1500
>                 URL: https://issues.apache.org/jira/browse/KUDU-1500
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet, tserver
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Assignee: Dinesh Bhat
>              Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
>   Write of size 8 at 0x7d4c0000cd90 by thread T71 (mutexes: write M4355):
>     #0 std::vector<kudu::PartitionSchema::HashBucketSchema, 
> std::allocator<kudu::PartitionSchema::HashBucketSchema> 
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
>  (libkudu_common.so+0x000000158983)
>     #1 std::vector<kudu::PartitionSchema::HashBucketSchema, 
> std::allocator<kudu::PartitionSchema::HashBucketSchema> >::clear() 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
>  (libkudu_common.so+0x00000013d721)
>     #2 kudu::PartitionSchema::Clear() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3 
> (libkudu_common.so+0x00000012f8fc)
>     #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&, 
> kudu::Schema const&, kudu::PartitionSchema*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3 
> (libkudu_common.so+0x00000012f4e6)
>     #4 
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7 
> (libtablet.so+0x0000003a4561)
>     #5 
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
>  const&) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3 
> (libtablet.so+0x0000003a7805)
>     #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&, 
> kudu::HostPort const&, scoped_refptr<kudu::tablet::TabletMetadata>*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
>  (libtserver.so+0x0000001148ca)
>     #7 
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const&, boost::optional<kudu::tserver::TabletServerErrorPB_Code>*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
>  (libtserver.so+0x0000001aa273)
>     #8 
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
>  const*, kudu::consensus::StartRemoteBootstrapResponsePB*, 
> kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14 
> (libtserver.so+0x000000175d03)
>     #9 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
>  const&)::$_8::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7 
> (libconsensus_proto.so+0x00000009e457)
>     #10 std::_Function_handler<void (google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
>  const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libconsensus_proto.so+0x00000009e17a)
>     #11 std::function<void (google::protobuf::Message const*, 
> google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x000000177997)
>     #12 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3 
> (libkrpc.so+0x00000017744c)
>     #13 kudu::rpc::ServicePool::RunThread() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5 
> (libkrpc.so+0x00000017a650)
>     #14 boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const 
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x00000017d55b)
>     #15 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> 
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>, 
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x00000017d438)
>     #16 boost::_bi::bind_t<void, boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > 
> >::operator()() /usr/include/boost/bind/bind_template.hpp:20:27 
> (libkrpc.so+0x00000017d39a)
>     #17 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf0<void, kudu::rpc::ServicePool>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /usr/include/boost/function/function_template.hpp:153:11 
> (libkrpc.so+0x00000017d050)
>     #18 boost::function0<void>::operator()() const 
> /usr/include/boost/function/function_template.hpp:1012:12 
> (libkrpc.so+0x0000000de7ca)
>     #19 kudu::Thread::SuperviseThread(void*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/util/thread.cc:586:3 
> (libkudu_util.so+0x0000002c1549)
>   Previous read of size 8 at 0x7d4c0000cd90 by thread T31:
>     #0 std::vector<kudu::PartitionSchema::HashBucketSchema, 
> std::allocator<kudu::PartitionSchema::HashBucketSchema> >::size() const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:655:40
>  (libtserver.so+0x00000010d296)
>     #1 kudu::PartitionSchema::ToPB(kudu::PartitionSchemaPB*) const 
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:164:46 
> (libkudu_common.so+0x00000013050c)
>     #2 
> kudu::tserver::TabletServiceImpl::ListTablets(kudu::tserver::ListTabletsRequestPB
>  const*, kudu::tserver::ListTabletsResponsePB*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:1110:5 
> (libtserver.so+0x00000017e82f)
>     #3 
> kudu::tserver::TabletServerServiceIf::TabletServerServiceIf(scoped_refptr<kudu::MetricEntity>
>  const&)::$_4::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/build/tsan/src/kudu/tserver/tserver_service.service.cc:118:7
>  (libtserver_service_proto.so+0x000000027997)
>     #4 std::_Function_handler<void (google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*), 
> kudu::tserver::TabletServerServiceIf::TabletServerServiceIf(scoped_refptr<kudu::MetricEntity>
>  const&)::$_4>::_M_invoke(std::_Any_data const&, google::protobuf::Message 
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*) 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
>  (libtserver_service_proto.so+0x0000000276ba)
>     #5 std::function<void (google::protobuf::Message const*, 
> google::protobuf::Message*, 
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, 
> google::protobuf::Message*, kudu::rpc::RpcContext*) const 
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
>  (libkrpc.so+0x000000177997)
>     #6 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3 
> (libkrpc.so+0x00000017744c)
>     #7 kudu::rpc::ServicePool::RunThread() 
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5 
> (libkrpc.so+0x00000017a650)
>     #8 boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const 
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x00000017d55b)
>     #9 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> 
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>, 
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>&, boost::_bi::list0&, int) 
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x00000017d438)
>     #10 boost::_bi::bind_t<void, boost::_mfi::mf0<void, 
> kudu::rpc::ServicePool>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > 
> >::operator()() /usr/include/boost/bind/bind_template.hpp:20:27 
> (libkrpc.so+0x00000017d39a)
>     #11 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf0<void, kudu::rpc::ServicePool>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /usr/include/boost/function/function_template.hpp:153:11 
> (libkrpc.so+0x00000017d050)
>     #12 boost::function0<void>::operator()() const 
> /usr/include/boost/function/function_template.hpp:1012:12 
> (libkrpc.so+0x0000000de7ca)
>     #13 kudu::Thread::SuperviseThread(void*) 
> /data/1/todd/kudu/build/tsan/../../src/kudu/util/thread.cc:586:3 
> (libkudu_util.so+0x0000002c1549)
> {code}
> triggered by looping RaftConsensusITest.TestCorruptReplicaMetadata under TSAN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to