[
https://issues.apache.org/jira/browse/HADOOP-19052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liang yu updated HADOOP-19052:
------------------------------
Description:
Using Hadoop 3.3.4
We use Spark Streaming to append multiple files in hadoop filesystem each
minute, which will cause a lot of append executions. We found that the write
speed in hadoop is very slow. Then we traced some datanodes' log and find that
there is a warning :
{code:java}
Waited above threshold(300 ms) to acq uire lock: lock identifier:
FsDatasetRWock waitTimeMs=518 ms.
Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559)
{code}
Then we traced the method
_org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
java:1239),_ and print how long each command take to finish the execution, and
find that it takes us 700ms to get the linkCount of the file.
We find that java has to start a new thread to execute a shell command
{code:java}
stat -c%h /path/to/file
{code}
this will take some time because we need to wait for the thread to fork.
I think we can use java native method to get this.
was:
Using Hadoop 3.3.4 and Spark 2.4.0
We use Spark Streaming to append multiple files in hadoop filesystem each
minute, which will cause a lot of append executions. We found that the write
speed in hadoop is very slow. Then we traced some datanodes' log and find that
there is a warning :
{code:java}
Waited above threshold(300 ms) to acq uire lock: lock identifier:
FsDatasetRWock waitTimeMs=518 ms.
Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559)
{code}
Then we traced the method
_org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
java:1239),_ and print how long each command take to finish the execution, and
find that it takes us 700ms to get the linkCount of the file.
We find that java has to start a new thread to execute a shell command
{code:java}
stat -c%h /path/to/file
{code}
this will take some time because we need to wait for the thread to fork.
I think we can use java native method to get this.
> Hadoop use Shell command to get the count of the hard link which takes a lot
> of time
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-19052
> URL: https://issues.apache.org/jira/browse/HADOOP-19052
> Project: Hadoop Common
> Issue Type: Improvement
> Environment: Hadopp 3.3.4
> Spark 2.4.0
> Reporter: liang yu
> Priority: Major
> Attachments: image-2024-01-26-16-18-44-969.png,
> image-2024-01-26-17-15-32-312.png, image-2024-01-26-17-19-49-805.png
>
>
> Using Hadoop 3.3.4
> We use Spark Streaming to append multiple files in hadoop filesystem each
> minute, which will cause a lot of append executions. We found that the write
> speed in hadoop is very slow. Then we traced some datanodes' log and find
> that there is a warning :
> {code:java}
> Waited above threshold(300 ms) to acq uire lock: lock identifier:
> FsDatasetRWock waitTimeMs=518 ms.
> Suppressed 13 lock wait warnings. Longest suppressed WaitTimeMs=838.
> The stack trace is: java. lang. Thread. getStackTrace (Thread. java: 1559)
> {code}
>
> Then we traced the method
> _org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.
> java:1239),_ and print how long each command take to finish the execution,
> and find that it takes us 700ms to get the linkCount of the file.
>
> We find that java has to start a new thread to execute a shell command
> {code:java}
> stat -c%h /path/to/file
> {code}
> this will take some time because we need to wait for the thread to fork.
>
> I think we can use java native method to get this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]