[
https://issues.apache.org/jira/browse/KUDU-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410343#comment-15410343
]
Dinesh Bhat commented on KUDU-1500:
-----------------------------------
[~tlipcon] [~mpercy] [[email protected]], please see the history of
discussions on KUDU-1264 as well. Our initial approach was to take the least
invasive path - i.e., copying the required attributes(under lock) in the read
path which races with metadata resurrection. Review is posted under
http://gerrit.cloudera.org:8080/3823
Approach 1(current): Guard the read path with same lock taken by resurrection
path. However, this kinda assumes that accessors in hot path like
Tablet::CheckRowInTablet are not active because we had quiesced the tablet
before during the 'delete after corruption' path. Although it seems like safe
assumption, it may open up room for future worms.
Approach 2: Now I am thinking why can't we take the same approach as in adding
a new tablet replica approach ? i.e since the tablet is in NOT_RUNNING state,
we need not serve ListTablets RPC for this tablet alone. I am not sure about
the consequence of this filtering on the cluster.
> TSAN race in ListTablets vs tablet metadata loading
> ---------------------------------------------------
>
> Key: KUDU-1500
> URL: https://issues.apache.org/jira/browse/KUDU-1500
> Project: Kudu
> Issue Type: Bug
> Components: tablet, tserver
> Affects Versions: 0.9.0
> Reporter: Todd Lipcon
> Assignee: Dinesh Bhat
> Labels: newbie
>
> {code}
> WARNING: ThreadSanitizer: data race (pid=20066)
> Write of size 8 at 0x7d4c0000cd90 by thread T71 (mutexes: write M4355):
> #0 std::vector<kudu::PartitionSchema::HashBucketSchema,
> std::allocator<kudu::PartitionSchema::HashBucketSchema>
> >::_M_erase_at_end(kudu::PartitionSchema::HashBucketSchema*)
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1439:26
> (libkudu_common.so+0x000000158983)
> #1 std::vector<kudu::PartitionSchema::HashBucketSchema,
> std::allocator<kudu::PartitionSchema::HashBucketSchema> >::clear()
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:1212:9
> (libkudu_common.so+0x00000013d721)
> #2 kudu::PartitionSchema::Clear()
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:876:3
> (libkudu_common.so+0x00000012f8fc)
> #3 kudu::PartitionSchema::FromPB(kudu::PartitionSchemaPB const&,
> kudu::Schema const&, kudu::PartitionSchema*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:129:3
> (libkudu_common.so+0x00000012f4e6)
> #4
> kudu::tablet::TabletMetadata::LoadFromSuperBlock(kudu::tablet::TabletSuperBlockPB
> const&)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:298:7
> (libtablet.so+0x0000003a4561)
> #5
> kudu::tablet::TabletMetadata::ReplaceSuperBlock(kudu::tablet::TabletSuperBlockPB
> const&)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tablet/tablet_metadata.cc:494:3
> (libtablet.so+0x0000003a7805)
> #6 kudu::tserver::RemoteBootstrapClient::Start(std::string const&,
> kudu::HostPort const&, scoped_refptr<kudu::tablet::TabletMetadata>*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/remote_bootstrap_client.cc:222:5
> (libtserver.so+0x0000001148ca)
> #7
> kudu::tserver::TSTabletManager::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
> const&, boost::optional<kudu::tserver::TabletServerErrorPB_Code>*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/ts_tablet_manager.cc:423:3
> (libtserver.so+0x0000001aa273)
> #8
> kudu::tserver::ConsensusServiceImpl::StartRemoteBootstrap(kudu::consensus::StartRemoteBootstrapRequestPB
> const*, kudu::consensus::StartRemoteBootstrapResponsePB*,
> kudu::rpc::RpcContext*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:982:14
> (libtserver.so+0x000000175d03)
> #9
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&)::$_8::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /data/1/todd/kudu/build/tsan/src/kudu/consensus/consensus.service.cc:188:7
> (libconsensus_proto.so+0x00000009e457)
> #10 std::_Function_handler<void (google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*),
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&)::$_8>::_M_invoke(std::_Any_data const&, google::protobuf::Message
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*)
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
> (libconsensus_proto.so+0x00000009e17a)
> #11 std::function<void (google::protobuf::Message const*,
> google::protobuf::Message*,
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
> (libkrpc.so+0x000000177997)
> #12 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3
> (libkrpc.so+0x00000017744c)
> #13 kudu::rpc::ServicePool::RunThread()
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5
> (libkrpc.so+0x00000017a650)
> #14 boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x00000017d55b)
> #15 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*>
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>,
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>&, boost::_bi::list0&, int)
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x00000017d438)
> #16 boost::_bi::bind_t<void, boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >
> >::operator()() /usr/include/boost/bind/bind_template.hpp:20:27
> (libkrpc.so+0x00000017d39a)
> #17
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
> boost::_mfi::mf0<void, kudu::rpc::ServicePool>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >,
> void>::invoke(boost::detail::function::function_buffer&)
> /usr/include/boost/function/function_template.hpp:153:11
> (libkrpc.so+0x00000017d050)
> #18 boost::function0<void>::operator()() const
> /usr/include/boost/function/function_template.hpp:1012:12
> (libkrpc.so+0x0000000de7ca)
> #19 kudu::Thread::SuperviseThread(void*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/util/thread.cc:586:3
> (libkudu_util.so+0x0000002c1549)
> Previous read of size 8 at 0x7d4c0000cd90 by thread T31:
> #0 std::vector<kudu::PartitionSchema::HashBucketSchema,
> std::allocator<kudu::PartitionSchema::HashBucketSchema> >::size() const
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/bits/stl_vector.h:655:40
> (libtserver.so+0x00000010d296)
> #1 kudu::PartitionSchema::ToPB(kudu::PartitionSchemaPB*) const
> /data/1/todd/kudu/build/tsan/../../src/kudu/common/partition.cc:164:46
> (libkudu_common.so+0x00000013050c)
> #2
> kudu::tserver::TabletServiceImpl::ListTablets(kudu::tserver::ListTabletsRequestPB
> const*, kudu::tserver::ListTabletsResponsePB*, kudu::rpc::RpcContext*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/tserver/tablet_service.cc:1110:5
> (libtserver.so+0x00000017e82f)
> #3
> kudu::tserver::TabletServerServiceIf::TabletServerServiceIf(scoped_refptr<kudu::MetricEntity>
> const&)::$_4::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /data/1/todd/kudu/build/tsan/src/kudu/tserver/tserver_service.service.cc:118:7
> (libtserver_service_proto.so+0x000000027997)
> #4 std::_Function_handler<void (google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*),
> kudu::tserver::TabletServerServiceIf::TabletServerServiceIf(scoped_refptr<kudu::MetricEntity>
> const&)::$_4>::_M_invoke(std::_Any_data const&, google::protobuf::Message
> const*, google::protobuf::Message*, kudu::rpc::RpcContext*)
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2039:2
> (libtserver_service_proto.so+0x0000000276ba)
> #5 std::function<void (google::protobuf::Message const*,
> google::protobuf::Message*,
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /data/1/todd/kudu/thirdparty/installed-deps-tsan/gcc/include/c++/4.9.3/functional:2439:14
> (libkrpc.so+0x000000177997)
> #6 kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_if.cc:94:3
> (libkrpc.so+0x00000017744c)
> #7 kudu::rpc::ServicePool::RunThread()
> /data/1/todd/kudu/build/tsan/../../src/kudu/rpc/service_pool.cc:206:5
> (libkrpc.so+0x00000017a650)
> #8 boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>::operator()(kudu::rpc::ServicePool*) const
> /usr/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0x00000017d55b)
> #9 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*>
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ServicePool>,
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>&, boost::_bi::list0&, int)
> /usr/include/boost/bind/bind.hpp:246:9 (libkrpc.so+0x00000017d438)
> #10 boost::_bi::bind_t<void, boost::_mfi::mf0<void,
> kudu::rpc::ServicePool>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >
> >::operator()() /usr/include/boost/bind/bind_template.hpp:20:27
> (libkrpc.so+0x00000017d39a)
> #11
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
> boost::_mfi::mf0<void, kudu::rpc::ServicePool>,
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> > >,
> void>::invoke(boost::detail::function::function_buffer&)
> /usr/include/boost/function/function_template.hpp:153:11
> (libkrpc.so+0x00000017d050)
> #12 boost::function0<void>::operator()() const
> /usr/include/boost/function/function_template.hpp:1012:12
> (libkrpc.so+0x0000000de7ca)
> #13 kudu::Thread::SuperviseThread(void*)
> /data/1/todd/kudu/build/tsan/../../src/kudu/util/thread.cc:586:3
> (libkudu_util.so+0x0000002c1549)
> {code}
> triggered by looping RaftConsensusITest.TestCorruptReplicaMetadata under TSAN
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)