[ 
https://issues.apache.org/jira/browse/HDFS-17577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867497#comment-17867497
 ] 

ASF GitHub Bot commented on HDFS-17577:
---------------------------------------

Hexiaoqiao commented on PR #6935:
URL: https://github.com/apache/hadoop/pull/6935#issuecomment-2241042463

   > But in my scenario, I am using flink's fileSystem API, and I have read the 
source code of the flink API that it used FileSystem.create(Path f) , which 
means that if I want to use the CreateFlag.IGNORE_CLIENT_LOCALITY in hadoop, I 
have to change the source code of flink filesystem API and rebuild the whole 
flink project.
   > 
   > I think this will also happens in most computation engines because most 
engines directly uses function FileSystem.create(Path f) . This will cause too 
many extra work.
   
   Got it. But I am sorry to disagree your opinion. There is one flexible 
interface however upstream system do not invoke it, thus we should push the 
upstream system to update. Another side, if config as this PR do, this will 
affect whole side run at this Client which could not be expected. In one word, 
suggest to proposal and submit PR at Flink side. Thanks again.




> Add Support for CreateFlag.NO_LOCAL_WRITE in File Creation to Manage Disk 
> Space and Network Load in Labeled YARN Nodes
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17577
>                 URL: https://issues.apache.org/jira/browse/HDFS-17577
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: dfsclient
>            Reporter: liang yu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 6D939050-0BC4-4B17-A6A3-A1EBBD60338D.png
>
>
> {*}Description{*}: I am currently using Apache Flink to write files into 
> Hadoop. The Flink application runs on a labeled YARN queue. During operation, 
> it has been observed that the local disks on these labeled nodes get filled 
> up quickly, and the network load is significantly high. This issue arises 
> because Hadoop prioritizes writing files to the local node first, and the 
> number of these labeled nodes is quite limited.
>  
> {*}Problem{*}: The current behavior leads to inefficient disk space 
> utilization and high network traffic on these few labeled nodes, which could 
> potentially affect the performance and reliability of the application. As 
> shown in the picture, the host I circled have a average net_bytes_sent speed 
> 1.2GB/s while the others are just 50MB/s, this imbalance in network and disk 
> space nearly destroyed the whole cluster. 
> !6D939050-0BC4-4B17-A6A3-A1EBBD60338D.png|width=901,height=257!
>  
> {*}Implementation{*}: The implementation would involve adding an 
> configuration _dfs.client.write.no_local_write_ to support the 
> {{CreateFlag.NO_LOCAL_WRITE}} during the file creation process in Hadoop's 
> file system APIs. This will provide flexibility to applications like Flink 
> running in labeled queues to opt for non-local writes when necessary.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to