[
https://issues.apache.org/jira/browse/HDFS-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suresh Srinivas resolved HDFS-1970.
-----------------------------------
Resolution: Duplicate
Duplicate of HDFS-1951 as indicated in previous comment.
> Null pointer exception comes when Namenode recovery happens and there is no
> response from client to NN more than the hardlimit for NN recovery and the
> current block is more than the prev block size in NN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1970
> URL: https://issues.apache.org/jira/browse/HDFS-1970
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20-append
> Reporter: ramkrishna.s.vasudevan
> Fix For: 0.20-append
>
>
> Null pointer exception comes when Namenode recovery happens and there is no
> response from client to NN more than the hardlimit for NN recovery and the
> current block is more than the prev block size in NN
> 1. Write using a client to 2 datanodes
> 2. Kill one data node and allow pipeline recovery.
> 3. write somemore data to the same block
> 4. Parallely allow the namenode recovery to happen
> Null pointer exception will come in addStoreBlock api.
> Pls find the logs
> Debugging in MachineName..
> Listening for transport dt_socket at address: 8007
> 11/05/20 21:38:33 INFO namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG: host = linux76/10.18.52.76
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.3-SNAPSHOT
> STARTUP_MSG: build = -r ; compiled by 'G00900856' on Tue Feb 1 11:40:14
> IST 2011
> ************************************************************/
> 11/05/20 21:38:33 INFO metrics.RpcMetrics: Initializing RPC Metrics with
> hostName=NameNode, port=9000
> 11/05/20 21:38:33 INFO namenode.NameNode: Namenode up at:
> linux76/10.18.52.76:9000
> 11/05/20 21:38:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=NameNode, sessionId=null
> 11/05/20 21:38:33 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics
> using context object:org.apache.hadoop.metrics.spi.NullContext
> 11/05/20 21:38:33 INFO namenode.FSNamesystem: fsOwner=root,root
> 11/05/20 21:38:33 INFO namenode.FSNamesystem: supergroup=supergroup
> 11/05/20 21:38:33 INFO namenode.FSNamesystem: isPermissionEnabled=false
> 11/05/20 21:38:33 INFO metrics.FSNamesystemMetrics: Initializing
> FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 11/05/20 21:38:33 INFO namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 11/05/20 21:38:33 INFO common.Storage: Number of files = 1
> 11/05/20 21:38:33 INFO common.Storage: Number of files under construction = 0
> 11/05/20 21:38:33 INFO common.Storage: Image file of size 94 loaded in 0
> seconds.
> 11/05/20 21:38:33 INFO common.Storage: Edits file
> /home/ramkrishna/opensrchadoop/appendbranch/hadoop-0.20.3-SNAPSHOT/bin/../hadoop-root/dfs/name/current/edits
> of size 4 edits # 0 loaded in 0 seconds.
> 11/05/20 21:38:33 INFO common.Storage: Image file of size 94 saved in 0
> seconds.
> 11/05/20 21:38:34 INFO common.Storage: Image file of size 94 saved in 0
> seconds.
> 11/05/20 21:38:34 INFO namenode.FSNamesystem: Finished loading FSImage in 482
> msecs
> 11/05/20 21:38:34 INFO namenode.FSNamesystem: Total number of blocks = 0
> 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of invalid blocks = 0
> 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of under-replicated
> blocks = 0
> 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of over-replicated
> blocks = 0
> 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Leaving safe mode after 0
> secs.
> 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Network topology has 0 racks
> and 0 datanodes
> 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0
> blocks
> 11/05/20 21:38:34 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 11/05/20 21:38:35 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the
> listener on 50070
> 11/05/20 21:38:35 INFO http.HttpServer: listener.getLocalPort() returned
> 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
> 11/05/20 21:38:35 INFO http.HttpServer: Jetty bound to port 50070
> 11/05/20 21:38:35 INFO mortbay.log: jetty-6.1.14
> 11/05/20 21:38:37 INFO mortbay.log: Started
> SelectChannelConnector@linux76:50070
> 11/05/20 21:38:37 INFO namenode.NameNode: Web-server up at: linux76:50070
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server Responder: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server listener on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 0 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 1 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 2 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 3 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 4 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 5 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 6 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 7 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 8 on 9000: starting
> 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 9 on 9000: starting
> 11/05/20 21:38:38 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode:
> node registration from 10.18.52.76:50010 storage
> DS-1868335495-10.18.52.76-50010-1305907718956
> 11/05/20 21:38:38 INFO net.NetworkTopology: Adding a new node:
> /default-rack/10.18.52.76:50010
> 11/05/20 21:38:41 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode:
> node registration from 10.18.52.5:50010 storage
> DS-721814011-10.18.52.5-50010-1305928120116
> 11/05/20 21:38:41 INFO net.NetworkTopology: Adding a new node:
> /default-rack/10.18.52.5:50010
> 11/05/20 21:38:54 INFO FSNamesystem.audit: ugi=R00902313,Tardis
> ip=/10.18.47.133 cmd=create src=/synctest0 dst=null
> perm=R00902313:supergroup:rw-r--r--
> 11/05/20 21:38:54 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /synctest0. blk_-8455334393673385140_1001
> 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> blockMap updated: 10.18.52.5:50010 is added to blk_-8455334393673385140_1001
> size 1024
> 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> blockMap updated: 10.18.52.76:50010 is added to blk_-8455334393673385140_1001
> size 1024
> 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /synctest0. blk_8955752156241965528_1001
> 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> blockMap updated: 10.18.52.76:50010 is added to blk_8955752156241965528_1001
> size 1024
> 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> blockMap updated: 10.18.52.5:50010 is added to blk_8955752156241965528_1001
> size 1024
> 11/05/20 21:39:11 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock:
> /synctest0. blk_-5249191180607448701_1001
> 11/05/20 21:40:18 INFO namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_-5249191180607448701_1001,
> newgenerationstamp=1002, newlength=833, newtargets=[10.18.52.76:50010],
> closeFile=false, deleteBlock=false)
> 11/05/20 21:40:18 INFO namenode.FSNamesystem: Number of transactions: 4 Total
> time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number
> of syncs: 1 SyncTimes(ms): 28
> 11/05/20 21:40:18 INFO namenode.FSNamesystem:
> commitBlockSynchronization(blk_-5249191180607448701_1002) successful
> 11/05/20 21:41:41 INFO namenode.LeaseManager: Lease [Lease. Holder:
> DFSClient_-676883980, pendingcreates: 1] has expired hard limit
> 11/05/20 21:41:56 INFO namenode.FSNamesystem: Recovering lease=[Lease.
> Holder: DFSClient_-676883980, pendingcreates: 1], src=/synctest0
> 11/05/20 21:41:56 INFO hdfs.StateChange: BLOCK* blk_-5249191180607448701_1002
> recovery started, primary=10.18.52.76:50010
> 11/05/20 21:42:09 INFO namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_-5249191180607448701_1002,
> newgenerationstamp=1003, newlength=867, newtargets=[10.18.52.76:50010],
> closeFile=true, deleteBlock=false)
> 11/05/20 21:42:26 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> addStoredBlock request received for blk_-5249191180607448701_1003 on
> 10.18.52.76:50010 size 867 and it belongs to a file under construction.
> 11/05/20 21:42:52 WARN namenode.FSNamesystem: Inconsistent size for block
> blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 current size is
> 833 reported size is 867
> 11/05/20 21:43:11 WARN namenode.FSNamesystem: Block
> blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 does not exist
> in blockMap. Surprise! Surprise!
> 11/05/20 21:43:13 INFO hdfs.StateChange: Removing lease on file /synctest0
> from client NN_Recovery
> 11/05/20 21:43:13 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to
> replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010
> 11/05/20 21:43:13 INFO namenode.FSNamesystem: Number of transactions: 6 Total
> time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number
> of syncs: 2 SyncTimes(ms): 30
> 11/05/20 21:43:13 INFO namenode.FSNamesystem:
> commitBlockSynchronization(newblock=blk_-5249191180607448701_1003,
> file=/synctest0, newgenerationstamp=1003, newlength=867,
> newtargets=[10.18.52.76:50010]) successful
> 11/05/20 21:43:15 INFO ipc.Server: IPC Server handler 6 on 9000, call
> blockReceived(DatanodeRegistration(10.18.52.76:50010,
> storageID=DS-1868335495-10.18.52.76-50010-1305907718956, infoPort=50075,
> ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@9b774e,
> [Ljava.lang.String;@b5d05b) from 10.18.52.76:40343: error:
> java.io.IOException: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(FSNamesystem.java:3173)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:3592)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:756)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> 11/05/20 21:43:19 WARN hdfs.StateChange: BLOCK* NameSystem.addStoredBlock:
> Redundant addStoredBlock request received for blk_-5249191180607448701_1003
> on 10.18.52.76:50010 size 867
> 11/05/20 21:43:19 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to
> replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010
> Listening for transport dt_socket at address: 8007
> 11/05/20 21:43:22 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to
> replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira