zqr10159 opened a new pull request, #3898: URL: https://github.com/apache/hertzbeat/pull/3898
This pull request introduces significant improvements to how DuckDB is managed for historical metric storage in the warehouse module. The main focus is on enhancing database connection handling by introducing HikariCP connection pooling, making configuration more flexible, and improving data retention and cleanup logic. Additionally, the DuckDB configuration is updated to allow setting the database file path and retention period via configuration files. **DuckDB Connection Management and Configuration Improvements:** * Switched DuckDB connection handling from direct `DriverManager` usage to HikariCP connection pooling for improved performance, reliability, and resource management. The `DuckdbDatabaseDataStorage` class now maintains a persistent pool of connections, which helps avoid file lock issues and reduces overhead from frequent open/close operations. (`[[1]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8R20-R21)`, `[[2]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L36)`, `[[3]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L62-R73)`, `[[4]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L83-R100)`, `[[5]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L158-R182)`, `[[6]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L243-R262)`, `[[7]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d 650c89e93f2f780b871488a506199535ae8L260-R277)`, `[[8]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L333-R350)`, `[[9]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8R418-R420)`) * Updated the DuckDB configuration (`DuckdbProperties` and relevant YAML files) to allow setting the database file path (`store-path`) and retention period (`expire-time`) via configuration, with sensible defaults. This makes it easier to customize deployments and manage data retention. (`[[1]](diffhunk://#diff-1a8a62c80dcfb014038e390768592d27cb8e38ff2e10f8fc3e6a7caebcca74e0R61)`, `[[2]](diffhunk://#diff-fbbd6cb1f62c63b9c263c9df8c2234f065b8df92bac40f984138d1f050333ed1R169)`, `[[3]](diffhunk://#diff-065505f8de273e75e9a6b534fa1e4f317df0d289a42e6a6e277d1b9ddf8c3b4bL29-R39)`) **Data Retention and Cleanup Enhancements:** * Improved the expired data cleaning logic: the scheduled cleaner now runs every hour, logs its activity, and also forces a DuckDB `CHECKPOINT` after cleanup to reduce file size and flush the write-ahead log (WAL), ensuring more efficient storage management. (`[[1]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8R131-R138)`, `[[2]](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L135-R165)`) * Minor bug fix in the data retention calculation to ensure expired records are correctly identified and deleted based on the configured retention period. (`[hertzbeat-warehouse/src/main/java/org/apache/hertzbeat/warehouse/store/history/tsdb/duckdb/DuckdbDatabaseDataStorage.javaR131-R138](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8R131-R138)`) **Other Notable Changes:** * Removed unnecessary index creation on DuckDB startup, likely to avoid index bloat or because indexes are managed elsewhere. (`[hertzbeat-warehouse/src/main/java/org/apache/hertzbeat/warehouse/store/history/tsdb/duckdb/DuckdbDatabaseDataStorage.javaL99-R120](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L99-R120)`) * Simplified handling of time-type metrics by removing redundant exception handling for integer parsing. (`[hertzbeat-warehouse/src/main/java/org/apache/hertzbeat/warehouse/store/history/tsdb/duckdb/DuckdbDatabaseDataStorage.javaL203-L208](diffhunk://#diff-e40d3486f7f0c1c34c71e187d145d650c89e93f2f780b871488a506199535ae8L203-L208)`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
