lcy999 opened a new issue, #47056:
URL: https://github.com/apache/doris/issues/47056

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   doris 2.1.7
   hadoop 3.1.4
   
   ### What's Wrong?
   
   Multiple replicas of the tablet are cold backed up to HDFS. It is common for 
some replicas to experience cold backup anomalies, while other tablets may have 
all replicas successfully cold backed up. If the partition replica is set to 1, 
this issue will not occur. The errors reported when replicas are cold backed up 
to HDFS mainly include ‘Blocklist for /data/10108/10110.0.meta has changed!’ 
and ‘Cannot read cooldown meta: [INTERNAL_ERROR] malformed tablet meta’.
   
   Below is the specific information:
   1. create table info:
   
   CREATE TABLE IF NOT EXISTS example_tbl_by_default_t01
   (
       timestamp DATETIME NOT NULL COMMENT "日志时间",
       type INT NOT NULL COMMENT "日志类型",
       error_code INT COMMENT "错误码",
       error_msg VARCHAR(1024) COMMENT "错误详细信息",
       op_id BIGINT COMMENT "负责人id",
       op_time DATETIME COMMENT "处理时间"
   )
   auto partition by list(error_msg)()
   DISTRIBUTED BY HASH(type) BUCKETS 1
   PROPERTIES (
   "replication_allocation" = "tag.location.default: 2"
   );
   
   2.  storage policy and resource info:
   CREATE RESOURCE "remote_hdfs_t01" PROPERTIES (
           "type"="hdfs",
           "fs.defaultFS"="qione01:9000"
       )
   
   CREATE STORAGE POLICY policy_hdfs_t01
   PROPERTIES(
       "storage_resource" = "remote_hdfs_t01",
       "cooldown_ttl" = "60"
   )
   
   
   ALTER TABLE example_tbl_by_default_t01 set ("storage_policy" = 
"policy_hdfs_t01");
   
   3. detail error:
   It has been confirmed that the meta file causing the error exists on HDFS 
and is in a normal state.
   
   [hdfs_builder.cpp:60] java.io.IOException: Blocklist for 
/data/10108/10110.0.meta has changed!
           at 
org.apache.hadoop.hdfs.DFSInputStream.fetchAndCheckLocatedBlocks(DFSInputStream.java:302)
           at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
           at 
org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1012)
           at 
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:952)
           at 
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:930)
           at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1128)
           at 
org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1496)
           at 
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1705)
           at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:259)
   
   [tablet.cpp:2451] cannot read cooldown meta: [INTERNAL_ERROR]malformed 
tablet meta
   , path=/data/24763/24765.0.meta
           0#  
doris::Tablet::_read_cooldown_meta(std::shared_ptr<doris::io::RemoteFileSystem> 
const&, doris::TabletMetaPB*) at 
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/..
   /../../../include/c++/11/bits/unique_ptr.h:120
           1#  doris::Tablet::_follow_cooldowned_data() at 
/root/doris/be/src/common/status.h:491
           2#  doris::Tablet::cooldown(std::shared_ptr<doris::Rowset>) at 
/root/doris/be/src/common/status.h:491
           3#  std::_Function_handler<void (), 
doris::StorageEngine::_cooldown_tasks_producer_callback()::$_1>::_M_invoke(std::_Any_data
 const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x8
   6_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
           4#  doris::WorkThreadPool<true>::work_thread(int) at 
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
           5#  execute_native_thread_routine at 
/data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
           6#  start_thread
           7#  __clone
   
   
   This is the information after cold backup of another table tablet. The issue 
of partial replicas of the tablet failing to cold back up will persist. After 
restarting the BE, it will return to normal, and the above errors will no 
longer occur.
   
![Image](https://github.com/user-attachments/assets/bb05e189-69be-41a4-bd9c-975c182c88e8)
   
   ### What You Expected?
   
   Multiple replicas of the tablet can be successfully cooled down to HDFS
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to