[
https://issues.apache.org/jira/browse/KUDU-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563979#comment-15563979
]
Todd Lipcon commented on KUDU-1692:
-----------------------------------
This seems to hold some tcmalloc locks, as I saw a lot of log messages
following the deletion like:
{code}
W1010 17:01:42.257256 4710 kernel_stack_watchdog.cc:144] Thread 5078 stuck at
../../src/kudu/rpc/outbound_call.cc:185 for 154ms:
Kernel stack:
[<ffffffff810a40c4>] hrtimer_nanosleep+0xc4/0x180
[<ffffffff810a41ee>] sys_nanosleep+0x6e/0x80
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x8655e8 base::internal::SpinLockDelay()
@ 0x854ac7 SpinLock::SlowLock()
@ 0x855777 tcmalloc::ThreadCache::IncreaseCacheLimit()
@ 0x1b936d6 operator delete()
{code}
as well as many long reactor freezes (8+ seconds):
{code}
W1010 17:02:58.113458 5078 connection.cc:199] RPC call timeout handler was
delayed by 8.05891s! This may be due to a process-wide pause such as swapping,
logging-related delays, or allocator lock contention. Will allow an additional
1.5608s for a response.
{code}
The pauses were bad enough that this caused lots of leader elections, write
timeouts, etc. This went on for about a minute and a half before the servers
became stable again.
> Deleting large tablets causes a lot of tcmalloc contention
> ----------------------------------------------------------
>
> Key: KUDU-1692
> URL: https://issues.apache.org/jira/browse/KUDU-1692
> Project: Kudu
> Issue Type: Bug
> Components: tablet, util
> Affects Versions: 1.0.0
> Reporter: Todd Lipcon
>
> I deleted a large table which contained about 1TB of data per tablet server.
> The tablet servers then started spending a large amount of time in this stack:
> {code}
> 855e94 tcmalloc::ThreadCache::GetThreadStats(unsigned
> long*, unsigned long*)
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 84e9ba ExtractStats(TCMallocStats*, unsigned long*,
> tcmalloc::PageHeap::SmallSpanStats*, tcmalloc::PageHeap::LargeSpanStats*)
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-releas
> 850f8f TCMallocImplementation::GetNumericProperty(char
> const*, unsigned long*)
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 1a18c50 kudu::GetTCMallocCurrentAllocatedBytes()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 1a19a50 kudu::MemTracker::UpdateConsumption()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 980f01 std::_Sp_counted_ptr<kudu::cfile::CFileReader*,
> (__gnu_cxx::_Lock_policy)2>::_M_dispose()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 99a937 kudu::tablet::CFileSet::~CFileSet()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 99ad61 kudu::tablet::CFileSet::~CFileSet()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 948b42 kudu::tablet::DiskRowSet::~DiskRowSet()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> 965f35 kudu::tablet::RowSetTree::~RowSetTree()
> (/opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin-release/kudu-tserver)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)