[ 
https://issues.apache.org/jira/browse/HADOOP-19052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814191#comment-17814191
 ] 

ASF GitHub Bot commented on HADOOP-19052:
-----------------------------------------

liangyu-1 opened a new pull request, #6527:
URL: https://github.com/apache/hadoop/pull/6527

   …nk which takes a lot of time
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   As described in 
[HADOOP_19052](https://issues.apache.org/jira/browse/HADOOP-19052). When we try 
to append a file, we will execute method `getHardLinkCount` twice, inside 
method `getHardLinkCount`, java start a new process to execute a shell command 
and wait for it to fork. When the QPS of `append` execution is very high, 
method  `getHardLinkCount` will take a long time to finish which will cause a 
long-time wait to acquire lock.
   
   I used another method to get the linkCount of a file whose file store 
supports the file attributes identified by the given file attribute view. This 
method does not start a new process and will finish in very short time even if 
the QPS of `append` execution is high.
   
   ### How was this patch tested?
   I add a new UT testGetLinkCountFromFileAttribute and a public method 
supportsHardLink to get whether or not this file store supports the file 
attributes identified by the given file attribute view.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Hadoop use Shell command to get the count of the hard link which takes a lot 
> of time
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19052
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19052
>             Project: Hadoop Common
>          Issue Type: Improvement
>         Environment: Hadopp 3.3.4
>            Reporter: liang yu
>            Priority: Major
>
> Using Hadoop 3.3.4
>  
> When the QPS of `append` executions is very high, at a rate of above 10000/s. 
>  
> We found that the write speed in hadoop is very slow. We traced some 
> datanodes' log and find that there is a warning :
> {code:java}
> 2024-01-26 11:09:44,292 WARN impl.FsDatasetImpl 
> (InstrumentedLock.java:logwaitWarning(165)) Waited above threshold(300 ms) to 
> acquire lock: lock identifier: FsDatasetRwlock waitTimeMs=336 ms.Suppressed 0 
> lock wait warnings.Longest supressed waitTimeMs=0.The stack trace is
> java.lang.Thread,getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1060)
> org.apache.hadoop.util.Instrumentedlock.logWaitWarning(InstrumentedLock.java:171)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:222)
> org.apache.hadoop.util.InstrumentedLock.lock(InstrumentedLock, iaya:105)
> org.apache.hadoop.util.AutocloseableLock.acquire(AutocloseableLock.java:67)
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:1239)
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:230)
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver 
> (DataXceiver.java:1313)
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock 
> (DataXceiver.java:764)
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:293)
> java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Then we traced the method 
> _org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
>  java:1239),_ and print how long each command take to finish the execution, 
> and find that it takes us 700ms to get the linkCount of the file which is 
> really slow.
>  
> We traced the code and  find that java1.8 use a Shell Command to get the 
> linkCount, in which execution it will start a new Process and wait for the 
> Process to fork, when the QPS is very high, it will sometimes take a long 
> time to fork the process.
> Here is the shell command.
> {code:java}
> stat -c%h /path/to/file
> {code}
>  
> Solution:
> For the FileStore that supports the file attributes "unix", we can use the 
> method _Files.getAttribute(f.toPath(), "unix:nlink")_ to get the linkCount, 
> this method doesn't need to start a new process, and will return the result 
> in a very short time.
>  
> When we use this method to get the file linkCount, we rarely get the WARN log 
> above when the QPS of append execution is high.
> .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to