lnbest0707-uber opened a new issue, #14592: URL: https://github.com/apache/pinot/issues/14592
On the real world cloud based stateful platform, host underlying the Pinot container would run in dynamic status. Host/Node replacement is very frequent. Such operation ideally should be fully transparent to users even without Pinot admins' awareness. However, Pinot nowadays, does not have a really graceful (enough) way to handle the node replacement. Though it is usually with multiple replicas, running in a under replication status would make the system stressful and risky. For example, if the table is with 2 replicas, during node replacement, we have to experience: - Many segments are only under 1 replica, the query load on it would go double. - For segments running with 1 replica, we are experiencing a very high risk that the data might lose or query might fail if another node goes down due to hardware or network issues. Though we would experience same issue during node restart, node replacement is far slower than node restart especially with high data volume. For example, for a large node with 5+TB data, the entire single node replacement might take 5+ hours to complete. This is far longer than the node restart which might only take minutes. The longer the node replacement is, the longer node downtime we have to endure, the higher the risk is introduced. Hence, **reducing node replacement downtime** is crucial for smooth large scale production maintenance. During the downtime, we would observe - Helix pending messages slowly decrease to 0 - SEGMENTS_WITH_LESS_REPLICAS (introduced in https://github.com/apache/pinot/pull/12336) slowly decrease The speed is far slower than a node restart because the node has to download the missing segment data from either deep store or peers before loading them into memory. Therefore, a straightforward and effective way to reduce the downtime is that, before bringing down the old node (ON), we'd better **pre-download all required segments on the new node** (NN). Afterwards, bringing down the ON and starting up the NN would be same as the lightweight node restart. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
