suxiaogang223 opened a new issue, #62101: URL: https://github.com/apache/doris/issues/62101
### Background The current third-party Docker startup flow in Doris has accumulated several usability and performance issues over time, especially around heavyweight services such as Hive, Iceberg-related components, and other stateful external dependencies. Common pain points include: - Long startup latency caused by repeated initialization, redundant downloads, and expensive bootstrap steps - Service startup being tightly coupled with data initialization, making simple restart and daily development workflows slow - Lack of incremental refresh mechanisms, so small data/script changes often require broad re-initialization - Poor usability of startup control, with limited mode distinctions such as fast start, refresh, rebuild, and targeted reset - Repeated environment preparation work that could be cached or reused safely These issues affect both local development efficiency and CI stability/cost. ### Goal This track issue focuses on improving the third-party Docker startup scripts with two primary goals: 1. Reduce startup time for common developer and CI workflows 2. Improve usability, observability, and control of startup/reset behavior ### Scope The optimization work may include, but is not limited to: - Reducing redundant initialization work during container startup - Caching or reusing downloaded/bootstrap artifacts when safe - Merging or simplifying expensive bootstrap steps - Removing unnecessary metadata repair or data scan operations - Decoupling service readiness from heavyweight data loading - Introducing clearer startup modes for different scenarios - Improving partial refresh / targeted rebuild support - Improving logs, diagnostics, and failure visibility - Standardizing script behavior across different third-party components ### Non-goals This track issue does not require all startup scripts to be fully redesigned in one step. Incremental improvements are acceptable as long as they clearly improve startup performance or usability without introducing instability. ### Proposed Work Items - [ ] Audit current third-party startup bottlenecks by component - [ ] Optimize Hive startup hot path - [ ] Reduce repeated downloads and improve local cache reuse - [ ] Clean up redundant metadata repair and bootstrap work - [ ] Introduce clearer startup mode semantics where needed - [ ] Improve restart experience after machine reboot or container restart - [ ] Improve script usability and error reporting - [ ] Add regression coverage for key startup flows ### Expected Benefits - Faster local setup and restart for contributors - Lower CI initialization cost and shorter feedback loops - Easier debugging and maintenance of third-party environments - More predictable and controllable startup behavior ### Notes This issue is intended to track a series of incremental PRs instead of one large refactor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
