[ 
https://issues.apache.org/jira/browse/HDDS-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760084#comment-16760084
 ] 

Shashikant Banerjee commented on HDDS-1046:
-------------------------------------------

The test needs to be fixed here as, in the test close container command is 
directly enqueued to SCM command queue without changing the container state in 
SCM. Also, close container command being queued is always sent with a random 
pipeline id threreby at datanode, it doesn't find the pipeline and move the 
container to quasi closed.

> TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked 
> fails intermittently
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDDS-1046
>                 URL: https://issues.apache.org/jira/browse/HDDS-1046
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 0.4.0
>            Reporter: Shashikant Banerjee
>            Priority: Major
>             Fix For: 0.4.0
>
>
>  
> {code:java}
> java.lang.StackOverflowError
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.getSubject(Subject.java:297)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:569)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getEncodedBlockToken(ContainerProtocolCalls.java:578)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.writeChunkAsync(ContainerProtocolCalls.java:318)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunkToContainer(BlockOutputStream.java:602)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunk(BlockOutputStream.java:464)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:480)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:489)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
> {code}
> The failure is happening because, ozone client receives a CONTAINER_NOT+OPEN 
> exception from daranode, and it allocates a new and retries to write. But 
> every allocate block call to SCM allocates a block on the same quasi closed 
> container and hence client retries indefinitely and ultimately runs out of 
> stack space.
> Logs below indicate 3 successive block allocations from SCM in quasi closed 
> container.
> {code:java}
> 15:15:26.812 [grpc-default-executor-3] ERROR DNAudit - user=null | ip=null | 
> op=WRITE_CHUNK {blockData=conID: 2 locID: 101533189852894070 bcId: 0} | 
> ret=FAILURE
> org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException:
>  Container 2 in QUASI_CLOSED state
> 15:15:26.818 [grpc-default-executor-3] ERROR DNAudit - user=null | ip=null | 
> op=WRITE_CHUNK {blockData=conID: 2 locID: 101533189853352823 bcId: 0} | 
> ret=FAILURE
> org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException:
>  Container 2 in QUASI_CLOSED state
> 15:15:26.825 [grpc-default-executor-3] ERROR DNAudit - user=null | ip=null | 
> op=WRITE_CHUNK {blockData=conID: 2 locID: 101533189853746040 bcId: 0} | 
> ret=FAILURE
> org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException:
>  Container 2 in QUASI_CLOSED state
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to