[
https://issues.apache.org/jira/browse/HDFS-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213359#comment-17213359
]
Ahmed Hussein commented on HDFS-15459:
--------------------------------------
The failure is related to {{StrippedBlockUtil.java}}.
I noticed the following issue for failing executions:
1- While Creating a file, an error happens in with the replicas
{code:bash}
2020-10-13 14:38:26,437 [IPC Server handler 2 on default port 18020] INFO
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseRandom(891)) - Not enough replicas was
chosen. Reason: {NODE_TOO_BUSY=4}
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(471)) - Failed to place enough
replicas, still in need of 2 to reach 9 (unavailableStorages=[],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and
org.apache.hadoop.net.NetworkTopology
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN
protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) -
Failed to place enough replicas: expected size is 2 but only 0 storage types
can be selected (replication=9, selected=[], unavailable=[DISK], removed=[DISK,
DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-10-13 14:38:26,438 [IPC Server handler 2 on default port 18020] WARN
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(471)) - Failed to place enough
replicas, still in need of 2 to reach 9 (unavailableStorages=[DISK],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All
required storage types are unavailable: unavailableStorages=[DISK],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-10-13 14:38:26,446 [Listener at localhost/65076] WARN
hdfs.DFSOutputStream (DFSStripedOutputStream.java:allocateNewBlock(531)) -
Cannot allocate parity block(index=7, policy=RS-6-3-1024k). Exclude nodes=[].
There may not be enough datanodes or racks. You can check if the cluster
topology supports the enabled erasure coding policies by running the command
'hdfs ec -verifyClusterSetup'.
2020-10-13 14:38:26,447 [Listener at localhost/65076] WARN
hdfs.DFSOutputStream (DFSStripedOutputStream.java:allocateNewBlock(531)) -
Cannot allocate parity block(index=8, policy=RS-6-3-1024k). Exclude nodes=[].
There may not be enough datanodes or racks. You can check if the cluster
topology supports the enabled erasure coding policies by running the command
'hdfs ec -verifyClusterSetup'.
{code}
2- some RBWs to be done. 7 replicas out of 9 will be created for each block.
3- Every-time this error happens, later in the JUnit, an input stream on the
same file will have a blockLocation that is"null".
The NPE comes because of the internal blocks parsed inside
{{StripedBlock.java}}.
An array is initialized with size {{dataBlkNum + parityBlkNum}} (which is 9),
while {{LocatedStripedBlock}} length is 7. Therefore two entries are left
uninitialized which causes the NPE.
4- I checked other places in the code where {{parseStripedBlockGroup()}} is
called and it seems that it is taken for granted that entries are not-null.
[~elgoiri], I am not very familiar with the DFStriped logic. Do you know if
this is a major bug? Or fixing the {{parseStripedBlockGroup}} loop is a simple
fix for that scenario?
> TestBlockTokenWithDFSStriped fails intermittently
> -------------------------------------------------
>
> Key: HDFS-15459
> URL: https://issues.apache.org/jira/browse/HDFS-15459
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Labels: test
> Attachments: TestBlockTokenWithDFSStriped.testRead.log
>
>
> {{TestBlockTokenWithDFSStriped}} fails intermittently on trunk with a NPE. I
> have intuition that this failure is caused by another Unit tests timing out.
> {code:bash}
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 94.448 s <<< FAILURE! - in
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
> [ERROR]
> testRead(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped)
> Time elapsed: 9.455 s <<< ERROR!
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.isBlockTokenExpired(TestBlockTokenWithDFS.java:633)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.isBlockTokenExpired(TestBlockTokenWithDFSStriped.java:139)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.doTestRead(TestBlockTokenWithDFS.java:508)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testRead(TestBlockTokenWithDFSStriped.java:92)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]