I think you should file a jira on this. Most likely this is what is happening :

 * two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3).

To test this, if you write just one block to an idle cluster it should succeed.

Writing from the client on the 3rd dn succeeds since local node is always favored.

This particular problem is not that severe on a large cluster but HDFS should do the sensible thing.

Raghu.

Stas Oskin wrote:
Hi.

I'm testing Hadoop in our lab, and started getting the following message
when trying to copy a file:
Could only be replicated to 0 nodes, instead of 1

I have the following setup:

* 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the 1.5GB
machine)
* The replication is set on 2
* I let the space on 2 smaller machines to end, to test the behavior

Now, one of the clients (the one located on 1.5GB) works fine, and the other
one - the external, unable to copy and displays the error + the exception
below

Any idea if this expected on my scenario? Or how it can be solved?

Thanks in advance.



09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
/test/test.bin retries left 1

09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.bin could only be replicated to 0 nodes, instead of 1

            at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
)

            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

            at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

            at java.lang.reflect.Method.invoke(Method.java:597)

            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



            at org.apache.hadoop.ipc.Client.call(Client.java:716)

            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

            at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

            at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

            at java.lang.reflect.Method.invoke(Method.java:597)

            at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

            at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)



09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
datanode[0]

java.io.IOException: Could not get block locations. Aborting...

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
)


Reply via email to