[ 
https://issues.apache.org/jira/browse/HDFS-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213359#comment-17213359
 ] 

Ahmed Hussein commented on HDFS-15459:
--------------------------------------

The failure is related to {{StrippedBlockUtil.java}}.

I noticed the following issue for failing executions:

1- While Creating a file, an error happens in with the replicas

{code:bash}
2020-10-13 14:38:26,437 [IPC Server handler 2 on default port 18020] INFO  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseRandom(891)) - Not enough replicas was 
chosen. Reason: {NODE_TOO_BUSY=4}
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(471)) - Failed to place enough 
replicas, still in need of 2 to reach 9 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
org.apache.hadoop.net.NetworkTopology
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN  
protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) - 
Failed to place enough replicas: expected size is 2 but only 0 storage types 
can be selected (replication=9, selected=[], unavailable=[DISK], removed=[DISK, 
DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(471)) - Failed to place enough 
replicas, still in need of 2 to reach 9 (unavailableStorages=[DISK], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
required storage types are unavailable:  unavailableStorages=[DISK], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-10-13 14:38:26,446 [Listener at localhost/65076] WARN  
hdfs.DFSOutputStream (DFSStripedOutputStream.java:allocateNewBlock(531)) - 
Cannot allocate parity block(index=7, policy=RS-6-3-1024k). Exclude nodes=[]. 
There may not be enough datanodes or racks. You can check if the cluster 
topology supports the enabled erasure coding policies by running the command 
'hdfs ec -verifyClusterSetup'.
2020-10-13 14:38:26,447 [Listener at localhost/65076] WARN  
hdfs.DFSOutputStream (DFSStripedOutputStream.java:allocateNewBlock(531)) - 
Cannot allocate parity block(index=8, policy=RS-6-3-1024k). Exclude nodes=[]. 
There may not be enough datanodes or racks. You can check if the cluster 
topology supports the enabled erasure coding policies by running the command 
'hdfs ec -verifyClusterSetup'.
{code}

2- some RBWs to be done. 7 replicas out of 9 will be created for each block.

3- Every-time this error happens, later in the JUnit, an input stream on the 
same file will have a blockLocation that is"null".
The NPE comes because of the internal blocks parsed inside 
{{StripedBlock.java}}.
An array is initialized with size {{dataBlkNum + parityBlkNum}} (which is 9), 
while {{LocatedStripedBlock}} length is 7. Therefore two entries are left 
uninitialized which causes the NPE.

4- I checked other places in the code where {{parseStripedBlockGroup()}} is 
called and it seems that it is taken for granted that entries are not-null. 

[~elgoiri], I am not very familiar with the DFStriped logic. Do you know if 
this is a major bug? Or fixing the {{parseStripedBlockGroup}} loop is a simple 
fix for that scenario?

> TestBlockTokenWithDFSStriped fails intermittently
> -------------------------------------------------
>
>                 Key: HDFS-15459
>                 URL: https://issues.apache.org/jira/browse/HDFS-15459
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>              Labels: test
>         Attachments: TestBlockTokenWithDFSStriped.testRead.log
>
>
> {{TestBlockTokenWithDFSStriped}} fails intermittently on trunk with a NPE. I 
> have intuition that this failure is caused by another Unit tests timing out.
> {code:bash}
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 94.448 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
> [ERROR] 
> testRead(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped)
>   Time elapsed: 9.455 s  <<< ERROR!
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.isBlockTokenExpired(TestBlockTokenWithDFS.java:633)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.isBlockTokenExpired(TestBlockTokenWithDFSStriped.java:139)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.doTestRead(TestBlockTokenWithDFS.java:508)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testRead(TestBlockTokenWithDFSStriped.java:92)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to