oicq1699 opened a new issue, #59312: URL: https://github.com/apache/doris/issues/59312
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version Version : doris-4.0.2-rc02 Git : git://vm-80@30d2df045941c55c57ce7cc67314d06216b1a9de BuildInfo : vm-80 Features : -TDE,-HDFS_STORAGE_VAULT,+UI,+AZURE_BLOB,+AZURE_STORAGE_VAULT,+HIVE_UDF,+BE_JAVA_EXTENSIONS BuildTime : Wed, 10 Dec 2025 16:33:17 CST ### What's Wrong? Take the audit_log table in the __internal_schema database as an example. Both the database and the table are system-created. The DDL of audit_log is as follows: ```sql -- __internal_schema.audit_log definition CREATE TABLE `audit_log` ( `query_id` varchar(48) NULL, `time` datetime(3) NULL, `client_ip` varchar(128) NULL, `user` varchar(128) NULL, `frontend_ip` varchar(1024) NULL, `catalog` varchar(128) NULL, `db` varchar(128) NULL, `state` varchar(128) NULL, `error_code` int NULL, `error_message` text NULL, `query_time` bigint NULL, `cpu_time_ms` bigint NULL, `peak_memory_bytes` bigint NULL, `scan_bytes` bigint NULL, `scan_rows` bigint NULL, `return_rows` bigint NULL, `shuffle_send_rows` bigint NULL, `shuffle_send_bytes` bigint NULL, `spill_write_bytes_from_local_storage` bigint NULL, `spill_read_bytes_from_local_storage` bigint NULL, `scan_bytes_from_local_storage` bigint NULL, `scan_bytes_from_remote_storage` bigint NULL, `parse_time_ms` int NULL, `plan_times_ms` map<text,int> NULL, `get_meta_times_ms` map<text,int> NULL, `schedule_times_ms` map<text,int> NULL, `hit_sql_cache` tinyint NULL, `handled_in_fe` tinyint NULL, `queried_tables_and_views` array<text> NULL, `chosen_m_views` array<text> NULL, `changed_variables` map<text,text> NULL, `sql_mode` text NULL, `stmt_type` varchar(48) NULL, `stmt_id` bigint NULL, `sql_hash` varchar(128) NULL, `sql_digest` varchar(128) NULL, `is_query` tinyint NULL, `is_nereids` tinyint NULL, `is_internal` tinyint NULL, `workload_group` text NULL, `compute_group` text NULL, `stmt` text NULL ) ENGINE=OLAP DUPLICATE KEY(`query_id`, `time`, `client_ip`) COMMENT 'Doris internal audit table, DO NOT MODIFY IT' PARTITION BY RANGE(`time`) (PARTITION p20251219 VALUES [('2025-12-19 00:00:00'), ('2025-12-20 00:00:00')), PARTITION p20251220 VALUES [('2025-12-20 00:00:00'), ('2025-12-21 00:00:00')), PARTITION p20251221 VALUES [('2025-12-21 00:00:00'), ('2025-12-22 00:00:00')), PARTITION p20251222 VALUES [('2025-12-22 00:00:00'), ('2025-12-23 00:00:00')), PARTITION p20251223 VALUES [('2025-12-23 00:00:00'), ('2025-12-24 00:00:00')), PARTITION p20251224 VALUES [('2025-12-24 00:00:00'), ('2025-12-25 00:00:00')), PARTITION p20251225 VALUES [('2025-12-25 00:00:00'), ('2025-12-26 00:00:00')), PARTITION p20251226 VALUES [('2025-12-26 00:00:00'), ('2025-12-27 00:00:00')), PARTITION p20251227 VALUES [('2025-12-27 00:00:00'), ('2025-12-28 00:00:00'))) DISTRIBUTED BY HASH(`query_id`) BUCKETS 2 PROPERTIES ( "replication_allocation" = "tag.location.default: 3", "min_load_replica_num" = "-1", "is_being_synced" = "false", "dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.time_zone" = "Asia/Shanghai", "dynamic_partition.start" = "-30", "dynamic_partition.end" = "3", "dynamic_partition.prefix" = "p", "dynamic_partition.replication_allocation" = "tag.location.default: 3", "dynamic_partition.buckets" = "2", "dynamic_partition.create_history_partition" = "false", "dynamic_partition.history_partition_num" = "-1", "dynamic_partition.hot_partition_num" = "0", "dynamic_partition.reserved_history_periods" = "NULL", "dynamic_partition.storage_policy" = "", "storage_medium" = "hdd", "storage_format" = "V2", "inverted_index_storage_format" = "V3", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false", "group_commit_interval_ms" = "10000", "group_commit_data_bytes" = "134217728" ); ``` It can be seen that the number of replicas has been set to 3; however, when checking with the commandSHOW REPLICA STATUS FROM audit_log, it is found that the first 8 TABLETs of this table only have a single replica. <img width="1086" height="611" alt="Image" src="https://github.com/user-attachments/assets/07bf40dc-cc92-499a-80c5-5c53c015d29d" /> This is why despite only a disk anomaly occurring on one node, the data could not be recovered. What’s more, this table is a system table, and I am not even sure whether I can rebuild this table and other tables under the same database. ### What You Expected? I expected that every TABLET in the system tables would have 3 replicas, and the system should have been able to recover automatically when one node was lost. ### How to Reproduce? Deploy a 5-node cluster, then check whether there are single-replica TABLETs in the audit_log table. If such TABLETs exist, shut down the corresponding node(s) to reproduce the fault. ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
