[ 
https://issues.apache.org/jira/browse/HDFS-17577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864514#comment-17864514
 ] 

ASF GitHub Bot commented on HDFS-17577:
---------------------------------------

liangyu-1 opened a new pull request, #6935:
URL: https://github.com/apache/hadoop/pull/6935

   … to Manage Disk Space and Network Load in Labeled YARN Nodes
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   As described in 
[HDFS-17577](https://issues.apache.org/jira/browse/HDFS-17577)
   I am currently using Apache Flink to write files into Hadoop. The Flink 
application runs on a labeled YARN queue. During operation, it has been 
observed that the local disks on these labeled nodes get filled up quickly, and 
the network load is significantly high. This issue arises because Hadoop 
prioritizes writing files to the local node first, and the number of these 
labeled nodes is quite limited.
   
   The current behavior leads to inefficient disk space utilization and high 
network traffic on these few labeled nodes, which could potentially affect the 
performance and reliability of the application. As shown in the picture, the 
host I circled have a average net_bytes_sent speed 1.2GB/s while the others are 
just 50MB/s, this imbalance in network and disk space nearly destroyed the 
whole cluster. 
   
![6D939050-0BC4-4B17-A6A3-A1EBBD60338D](https://github.com/apache/hadoop/assets/62563545/d82fdbbb-2a74-4d00-96de-2fe435f182e0)
    
   
   **Implementation:** 
   I add an configuration `dfs.client.write.no_local_write` to support the 
`CreateFlag.NO_LOCAL_WRITE` during the file creation process in Hadoop's file 
system APIs. This will provide flexibility to applications like Flink running 
in labeled queues to opt for non-local writes when necessary.
   
   ### How was this patch tested?
   
   I have rebuilt the whole hadoop-hdfs-client module, and then test them using 
flink on the labeled YARN queue, the distribution of disk storage across the 
nodes in the cluster is more even, and the network load has also improved.
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Add Support for CreateFlag.NO_LOCAL_WRITE in File Creation to Manage Disk 
> Space and Network Load in Labeled YARN Nodes
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17577
>                 URL: https://issues.apache.org/jira/browse/HDFS-17577
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: dfsclient
>            Reporter: liang yu
>            Priority: Major
>         Attachments: 6D939050-0BC4-4B17-A6A3-A1EBBD60338D.png
>
>
> {*}Description{*}: I am currently using Apache Flink to write files into 
> Hadoop. The Flink application runs on a labeled YARN queue. During operation, 
> it has been observed that the local disks on these labeled nodes get filled 
> up quickly, and the network load is significantly high. This issue arises 
> because Hadoop prioritizes writing files to the local node first, and the 
> number of these labeled nodes is quite limited.
>  
> {*}Problem{*}: The current behavior leads to inefficient disk space 
> utilization and high network traffic on these few labeled nodes, which could 
> potentially affect the performance and reliability of the application. As 
> shown in the picture, the host I circled have a average net_bytes_sent speed 
> 1.2GB/s while the others are just 50MB/s, this imbalance in network and disk 
> space nearly destroyed the whole cluster. 
> !6D939050-0BC4-4B17-A6A3-A1EBBD60338D.png|width=901,height=257!
>  
> {*}Implementation{*}: The implementation would involve adding an 
> configuration _dfs.client.write.no_local_write_ to support the 
> {{CreateFlag.NO_LOCAL_WRITE}} during the file creation process in Hadoop's 
> file system APIs. This will provide flexibility to applications like Flink 
> running in labeled queues to opt for non-local writes when necessary.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to