[
https://issues.apache.org/jira/browse/KUDU-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved KUDU-1433.
-------------------------------
Resolution: Fixed
Fixed in 2b86e94d992468e6ee92733662af5fc959da57e3
> MaintenanceManager::GetMaintenanceManagerStatusDump can crash a server
> ----------------------------------------------------------------------
>
> Key: KUDU-1433
> URL: https://issues.apache.org/jira/browse/KUDU-1433
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: 0.5.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Blocker
> Fix For: 0.9.0
>
>
> The tserver Andrew and I have been using for the hackathon crashed when we
> hit the /maintenance-manager URL. The crash:
> {noformat}
> F0429 19:18:42.514312 35122 maintenance_manager.h:54] Check failed: valid_
> *** Check failure stack trace: ***
> @ 0x7fab69c2cf4d google::LogMessage::Fail()
> @ 0x7fab69c2ee4d google::LogMessage::SendToLog()
> @ 0x7fab69c2ca89 google::LogMessage::Flush()
> @ 0x7fab69c2f8ef google::LogMessageFatal::~LogMessageFatal()
> @ 0x7fab6f8b16e6
> kudu::MaintenanceManager::GetMaintenanceManagerStatusDump()
> @ 0x7fab70d56f68
> kudu::tserver::TabletServerPathHandlers::HandleMaintenanceManagerPage()
> @ 0x7fab70d57d34
> boost::detail::function::void_function_obj_invoker2<>::invoke()
> @ 0x7fab6ffc1cfc kudu::Webserver::RunPathHandler()
> @ 0x7fab6ffc2716 kudu::Webserver::BeginRequestCallback()
> @ 0x7fab6ffc28dc kudu::Webserver::BeginRequestCallbackStatic()
> @ 0x7fab6ffce32e handle_request
> @ 0x7fab6ffd0c2e process_new_connection
> @ 0x7fab6ffd12cc worker_thread
> @ 0x7fab6be98aa1 start_thread
> @ 0x7fab67e1593d clone
> @ (nil) (unknown)
> {noformat}
> I suspect that we've got at least one op whose UpdateStats() method is not
> calling even one setter on the MaintenanceMgrStats object passed into it, or
> isn't writing cached previous stats into the passed-in object. LogGC,
> FlushDeltaMemStores, and FlushMRS are all culprits. There's nothing
> necessarily wrong with that (though it would be interesting to remember why
> we don't cache stats in these ops), so we need to fix
> GetMaintenanceManagerStatusDump to not access !valid_ stats objects.
> I think this was introduced about a year ago by commit 5e1f45e.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)