[ 
https://issues.apache.org/jira/browse/KUDU-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1433.
-------------------------------
    Resolution: Fixed

Fixed in 2b86e94d992468e6ee92733662af5fc959da57e3

> MaintenanceManager::GetMaintenanceManagerStatusDump can crash a server
> ----------------------------------------------------------------------
>
>                 Key: KUDU-1433
>                 URL: https://issues.apache.org/jira/browse/KUDU-1433
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 0.5.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Blocker
>             Fix For: 0.9.0
>
>
> The tserver Andrew and I have been using for the hackathon crashed when we 
> hit the /maintenance-manager URL. The crash:
> {noformat}
> F0429 19:18:42.514312 35122 maintenance_manager.h:54] Check failed: valid_
> *** Check failure stack trace: ***
>     @     0x7fab69c2cf4d  google::LogMessage::Fail()
>     @     0x7fab69c2ee4d  google::LogMessage::SendToLog()
>     @     0x7fab69c2ca89  google::LogMessage::Flush()
>     @     0x7fab69c2f8ef  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fab6f8b16e6  
> kudu::MaintenanceManager::GetMaintenanceManagerStatusDump()
>     @     0x7fab70d56f68  
> kudu::tserver::TabletServerPathHandlers::HandleMaintenanceManagerPage()
>     @     0x7fab70d57d34  
> boost::detail::function::void_function_obj_invoker2<>::invoke()
>     @     0x7fab6ffc1cfc  kudu::Webserver::RunPathHandler()
>     @     0x7fab6ffc2716  kudu::Webserver::BeginRequestCallback()
>     @     0x7fab6ffc28dc  kudu::Webserver::BeginRequestCallbackStatic()
>     @     0x7fab6ffce32e  handle_request
>     @     0x7fab6ffd0c2e  process_new_connection
>     @     0x7fab6ffd12cc  worker_thread
>     @     0x7fab6be98aa1  start_thread
>     @     0x7fab67e1593d  clone
>     @              (nil)  (unknown)
> {noformat}
> I suspect that we've got at least one op whose UpdateStats() method is not 
> calling even one setter on the MaintenanceMgrStats object passed into it, or 
> isn't writing cached previous stats into the passed-in object. LogGC, 
> FlushDeltaMemStores, and FlushMRS are all culprits. There's nothing 
> necessarily wrong with that (though it would be interesting to remember why 
> we don't cache stats in these ops), so we need to fix 
> GetMaintenanceManagerStatusDump to not access !valid_ stats objects.
> I think this was introduced about a year ago by commit 5e1f45e.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to