[ 
https://issues.apache.org/jira/browse/HDDS-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-1486:
------------------------------------
    Description: 
15 node physical cluster. All Datanodes are up and running.
Client using 16 threads attempting to write 16000 x 10MB+ files using the 
FsStress utility 
(https://github.com/arp7/FsPerfTest) fails with the following error.
This is an intermittent issue.

*Server side exceptions*
{code}
19/04/22 10:13:32 ERROR io.KeyOutputStream: Try to allocate more blocks for 
write failed, already allocated 0 blocks for this write.

19/04/18 14:33:23 WARN io.KeyOutputStream: Encountered exception 
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.AlreadyClosedException: SlidingWindow$Client 
client-ADE7F801D3AD->RAFT is closed.. The last committed block length is 0, 
uncommitted data length is 10485760 retry count 0
{code}

*Client side exceptions*
{code}
FAILED org.apache.ratis.protocol.NotLeaderException: Server 
c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f is not the leader 
(f44c1413-0847-45e3-982d-ac3aec15dffc:10.17.200.23:9858). Request must be sent 
to leader., logIndex=0, commits[c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f:c131161, 
287eccfb-8461-419a-8732-529d042380b3:c131161, 
f44c1413-0847-45e3-982d-ac3aec15dffc:c131161]
{code} 

In the case of small key sizes (<1MB) and big key sizes with single thread, the 
above client side exceptions are infrequent. However, in the case of 
multithreaded 10MB+ size keys, the exceptions occur about 50% of the time and 
eventually cause write failures. I have attached one such failed pipeline logs.
 [^Datanode Logs.zip] 

  was:
15 node physical cluster. All Datanodes are up and running.
Client attempting to write 1600 x 100MB files using the FsStress utility 
(https://github.com/arp7/FsPerfTest) fails with the following error.
This is an intermittent issue.

*Server side exceptions*
{code}
19/04/22 10:13:32 ERROR io.KeyOutputStream: Try to allocate more blocks for 
write failed, already allocated 0 blocks for this write.

19/04/18 14:33:23 WARN io.KeyOutputStream: Encountered exception 
java.io.IOException: Unexpected Storage Container Exception: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.ratis.protocol.AlreadyClosedException: SlidingWindow$Client 
client-ADE7F801D3AD->RAFT is closed.. The last committed block length is 0, 
uncommitted data length is 10485760 retry count 0
{code}

*Client side exceptions*
{code}
FAILED org.apache.ratis.protocol.NotLeaderException: Server 
c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f is not the leader 
(f44c1413-0847-45e3-982d-ac3aec15dffc:10.17.200.23:9858). Request must be sent 
to leader., logIndex=0, commits[c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f:c131161, 
287eccfb-8461-419a-8732-529d042380b3:c131161, 
f44c1413-0847-45e3-982d-ac3aec15dffc:c131161]
{code} 

In the case of small key sizes (<1MB) and big key sizes with single thread, the 
above client side exceptions are infrequent. However, in the case of 
multithreaded 10MB+ size keys, the exceptions occur about 50% of the time and 
eventually cause write failures. I have attached one such failed pipeline logs.
 [^Datanode Logs.zip] 


> Ozone write fails in allocateBlock while writing >10MB files in multiple 
> threads.
> ---------------------------------------------------------------------------------
>
>                 Key: HDDS-1486
>                 URL: https://issues.apache.org/jira/browse/HDDS-1486
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Aravindan Vijayan
>            Priority: Major
>              Labels: intermittent
>         Attachments: Datanode Logs.zip
>
>
> 15 node physical cluster. All Datanodes are up and running.
> Client using 16 threads attempting to write 16000 x 10MB+ files using the 
> FsStress utility 
> (https://github.com/arp7/FsPerfTest) fails with the following error.
> This is an intermittent issue.
> *Server side exceptions*
> {code}
> 19/04/22 10:13:32 ERROR io.KeyOutputStream: Try to allocate more blocks for 
> write failed, already allocated 0 blocks for this write.
> 19/04/18 14:33:23 WARN io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyClosedException: SlidingWindow$Client 
> client-ADE7F801D3AD->RAFT is closed.. The last committed block length is 0, 
> uncommitted data length is 10485760 retry count 0
> {code}
> *Client side exceptions*
> {code}
> FAILED org.apache.ratis.protocol.NotLeaderException: Server 
> c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f is not the leader 
> (f44c1413-0847-45e3-982d-ac3aec15dffc:10.17.200.23:9858). Request must be 
> sent to leader., logIndex=0, 
> commits[c6e64cc4-91e9-4b36-83e4-6d84a4e71b7f:c131161, 
> 287eccfb-8461-419a-8732-529d042380b3:c131161, 
> f44c1413-0847-45e3-982d-ac3aec15dffc:c131161]
> {code} 
> In the case of small key sizes (<1MB) and big key sizes with single thread, 
> the above client side exceptions are infrequent. However, in the case of 
> multithreaded 10MB+ size keys, the exceptions occur about 50% of the time and 
> eventually cause write failures. I have attached one such failed pipeline 
> logs.
>  [^Datanode Logs.zip] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to