hfutatzhanghb commented on PR #7810:
URL: https://github.com/apache/hadoop/pull/7810#issuecomment-3232146220

   @Hexiaoqiao Thanks very much for reviewing. Please allow me to define issue 
clearly here.
   
   Recently, while exploring the use of HDFS Erasure Coding (EC) for hot-data 
storage, we encountered some problems and the current issue is one of them.
   
   **Problem description (pseudo-code):**
   
   ```java
   DFSStripedOutputStream os = dfs.create(path);
   // The task may run for several hours, so the output-stream object is also 
held open for hours.
   while (task is not finished) {
       data = doSomeComputeLogicAndGetData();
       os.write(data);
   }
   os.close();
   ```
   
   When we perform a rolling restart of DataNodes, the above task fails.
   The root cause is that, during writing, an EC output stream will exclude any 
bad DataNode from the pipeline, but there is no mechanism to add new DataNodes 
to replace the excluded ones. Once more than three DataNodes have been 
excluded, the output stream no longer has enough DataStreamers to continue 
writing and therefore aborts.
   
   So, this pr try to resolve such problem by ending block group in advance 
when meet failed streamers(count of streamer <= 3)to allocate new block, after 
allocating new block, we wil have sufficient data streamer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to