lcy999 opened a new issue, #47056: URL: https://github.com/apache/doris/issues/47056
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version doris 2.1.7 hadoop 3.1.4 ### What's Wrong? Multiple replicas of the tablet are cold backed up to HDFS. It is common for some replicas to experience cold backup anomalies, while other tablets may have all replicas successfully cold backed up. If the partition replica is set to 1, this issue will not occur. The errors reported when replicas are cold backed up to HDFS mainly include ‘Blocklist for /data/10108/10110.0.meta has changed!’ and ‘Cannot read cooldown meta: [INTERNAL_ERROR] malformed tablet meta’. Below is the specific information: 1. create table info: CREATE TABLE IF NOT EXISTS example_tbl_by_default_t01 ( timestamp DATETIME NOT NULL COMMENT "日志时间", type INT NOT NULL COMMENT "日志类型", error_code INT COMMENT "错误码", error_msg VARCHAR(1024) COMMENT "错误详细信息", op_id BIGINT COMMENT "负责人id", op_time DATETIME COMMENT "处理时间" ) auto partition by list(error_msg)() DISTRIBUTED BY HASH(type) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 2" ); 2. storage policy and resource info: CREATE RESOURCE "remote_hdfs_t01" PROPERTIES ( "type"="hdfs", "fs.defaultFS"="qione01:9000" ) CREATE STORAGE POLICY policy_hdfs_t01 PROPERTIES( "storage_resource" = "remote_hdfs_t01", "cooldown_ttl" = "60" ) ALTER TABLE example_tbl_by_default_t01 set ("storage_policy" = "policy_hdfs_t01"); 3. detail error: It has been confirmed that the meta file causing the error exists on HDFS and is in a normal state. [hdfs_builder.cpp:60] java.io.IOException: Blocklist for /data/10108/10110.0.meta has changed! at org.apache.hadoop.hdfs.DFSInputStream.fetchAndCheckLocatedBlocks(DFSInputStream.java:302) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238) at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1012) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:952) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1128) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1496) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1705) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:259) [tablet.cpp:2451] cannot read cooldown meta: [INTERNAL_ERROR]malformed tablet meta , path=/data/24763/24765.0.meta 0# doris::Tablet::_read_cooldown_meta(std::shared_ptr<doris::io::RemoteFileSystem> const&, doris::TabletMetaPB*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/.. /../../../include/c++/11/bits/unique_ptr.h:120 1# doris::Tablet::_follow_cooldowned_data() at /root/doris/be/src/common/status.h:491 2# doris::Tablet::cooldown(std::shared_ptr<doris::Rowset>) at /root/doris/be/src/common/status.h:491 3# std::_Function_handler<void (), doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x8 6_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701 4# doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646 5# execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85 6# start_thread 7# __clone This is the information after cold backup of another table tablet. The issue of partial replicas of the tablet failing to cold back up will persist. After restarting the BE, it will return to normal, and the above errors will no longer occur.  ### What You Expected? Multiple replicas of the tablet can be successfully cooled down to HDFS ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org