[ 
https://issues.apache.org/jira/browse/KUDU-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796396#comment-17796396
 ] 

ASF subversion and git services commented on KUDU-3532:
-------------------------------------------------------

Commit a84c8376a78b3a86be75d434e0f7ff2853c0c880 in kudu's branch 
refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a84c8376a ]

KUDU-3532: Fix range aware replica placement bug

An implicit conversion from unsigned long to int caused
an std::length_error to be thrown when a vector tried to
reserve a a size greater than the max size. This happens
when a negative number is converted. This bug was
introduced by changelist [1].

I also added tests with tablet servers in multiple
locations. This omission caused this bug to go
unnoticed until now.

[1] https://github.com/apache/kudu/commit/10fdaf6a9

Change-Id: Id5d696d58965590a9f91f8b1b59f23225bbad8ee
Reviewed-on: http://gerrit.cloudera.org:8080/20781
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>


> Unable to place replicas using range aware logic with multiple locations
> ------------------------------------------------------------------------
>
>                 Key: KUDU-3532
>                 URL: https://issues.apache.org/jira/browse/KUDU-3532
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.17.0
>            Reporter: Mahesh Reddy
>            Assignee: Mahesh Reddy
>            Priority: Major
>             Fix For: 1.18.0
>
>
> When multiple locations exist, it's possible an std::length_error will be 
> thrown 
> [here|https://github.com/apache/kudu/blob/master/src/kudu/master/placement_policy.cc#L385].
> An implicit conversion from unsigned long to int is the culprit here. If 
> "choices_size" is negative, the implicit conversion to int will make it 
> larger than the the max size allowed to reserve a vector and an error will be 
> thrown.
> Below is a stack trace from a master crash due to this bug:
> SIGABRT (@0x1da00007b60) received by PID 31584 (TID 0x7fdf9644f700) from PID 
> 31584; stack trace: ***
>     @ 0xe48496 google::(anonymous namespace)::FailureSignalHandler()
>     @ 0x7fdfb9a90630 (unknown)
>     @ 0x7fdfb7c95387 __GI_raise
>     @ 0x7fdfb7c96a78 __GI_abort
>     @ 0x7fdfb85a5a95 {_}{{_}}gnu_cxx::\{_}_verbose_terminate_handler()
>     @ 0x7fdfb85a3a06 (unknown)
>     @ 0x7fdfb85a3a33 std::terminate()
>     @ 0x7fdfb85a3c53 __cxa_throw
>     @ 0x7fdfb85f8a67 std::__throw_length_error()
>     @ 0xe01fcf kudu::ReservoirSample<>()
>     @ 0xdfce0f kudu::master::PlacementPolicy::SelectReplica()
>     @ 0xdff386 kudu::master::PlacementPolicy::PlaceExtraTabletReplica()
>     @ 0xd873bf kudu::master::AsyncAddReplicaTask::SendRequest()
>     @ 0xd7912c kudu::master::RetryingTSRpcTask::Run()
>     @ 0xda5412 kudu::master::CatalogManager::ProcessTabletReport()
>     @ 0xdf7018 kudu::master::MasterServiceImpl::TSHeartbeat()
>     @ 0x2fea455 kudu::rpc::GeneratedServiceIf::Handle()
>     @ 0x2feb44a kudu::rpc::ServicePool::RunThread()
>     @ 0x31d2e1e kudu::Thread::SuperviseThread()
>     @ 0x7fdfb9a88ea5 start_thread
>     @ 0x7fdfb7d5db0d __clone



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to