[
https://issues.apache.org/jira/browse/HADOOP-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HADOOP-2606:
----------------------------------------
Attachment: ReplicatorNew2.patch
> 1. Interruptedexception or all other FSNamesystem deamons, e.g.
> DecommissionedMonitor, ResolutionMonitor, etc.
Yes, we should do that. And probably not only name system daemons. I'll file a
jira.
> 2. A typo
> 3. If a block in neededReplication
Done
> 4. This patch prefers nodes-being-decommissioned to be source of replication
> requests.
My understanding is that the node in decommission-in-progress state SHOULD not
be shutdown until its state changes to decommissioned.
And the state can be changed to decommissioned only if all its blocks are
replicated no matter who performs replications.
If the machine is shutdown anyway then the block will eventually be replicated
by another machine.
> 5. FSNamesystem.chooseSourceDatanode() should always return a node if
> possible.
This is a good catch thanks I corrected it.
> 6.
Removed method names from the state change logs. We should not use method names
in the future because
NameNode.stateChangeLog() prints the name automatically.
> 7.
Done.
> Namenode unstable when replicating 500k blocks at once
> ------------------------------------------------------
>
> Key: HADOOP-2606
> URL: https://issues.apache.org/jira/browse/HADOOP-2606
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.3
> Reporter: Koji Noguchi
> Assignee: Konstantin Shvachko
> Fix For: 0.17.0
>
> Attachments: ReplicatorNew.patch, ReplicatorNew1.patch,
> ReplicatorNew2.patch, ReplicatorTestOld.patch
>
>
> We tried to decommission about 40 nodes at once, each containing 12k blocks.
> (about 500k total)
> (This also happened when we first tried to decommission 2 million blocks)
> Clients started experiencing "java.lang.RuntimeException:
> java.net.SocketTimeoutException: timed out waiting for rpc
> response" and namenode was in 100% cpu state.
> It was spending most of its time on one thread,
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x0000002e10702800 nid=0x6718
> runnable [0x0000000041a42000..0x0000000041a42a30]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.dfs.FSNamesystem.containingNodeList(FSNamesystem.java:2766)
> at
> org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2870)
> - locked <0x0000002aa3cef720> (a
> org.apache.hadoop.dfs.UnderReplicatedBlocks)
> - locked <0x0000002aa3c42e28> (a org.apache.hadoop.dfs.FSNamesystem)
> at
> org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1928)
> at
> org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1868)
> at java.lang.Thread.run(Thread.java:619)
> We confirmed that Namenode was not in the fullGC states when these problem
> happened.
> Also, dfsadmin -metasave was showing "Blocks waiting for replication" was
> decreasing very slowly.
> I believe this is not specific to decommission and same problem would happen
> if we lose one rack.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.