[
https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352591#comment-14352591
]
Vinayakumar B edited comment on HDFS-7884 at 3/9/15 6:52 AM:
-------------------------------------------------------------
Earlier comment was a quick guess.
The actual reason is, reader is trying to read using the old block Id
(blk_1073741911_1102)
But the original block's gs was modified after appending.
{noformat}2015-03-09 02:03:21,488 INFO impl.FsDatasetImpl
(FsDatasetImpl.java:append(1015)) - Appending to FinalizedReplica,
blk_1073741911_1102, FINALIZED{noformat}
{noformat}2015-03-09 02:03:21,501 INFO namenode.FSNamesystem
(FSNamesystem.java:updatePipeline(6199)) - updatePipeline(blk_1073741911_1102,
newGS=1110, newLength=638, newNodes=[127.0.0.1:52069, 127.0.0.1:52065,
127.0.0.1:52074], client=DFSClient_NONMAPREDUCE_-727094507_1){noformat}
For initializing the BlockSender, while getting the replica, GS was not
checked. {code} synchronized(datanode.data) {
replica = getReplica(block, datanode);
replicaVisibleLength = replica.getVisibleLength();
}{code}
GS was checked against client-passed GS only for latest. i.e. If client is
latest and DN have old, then only throw exception. Othercase it should support
read according to code.
{code} if (replica.getGenerationStamp() < block.getGenerationStamp()) {
throw new IOException("Replica gen stamp < block genstamp, block="
+ block + ", replica=" + replica);
}{code}
But while getting the Volume reference it will be checked down the line in
ReplicaMap#get
{code} ReplicaInfo get(String bpid, Block block) {
checkBlockPool(bpid);
checkBlock(block);
ReplicaInfo replicaInfo = get(bpid, block.getBlockId());
if (replicaInfo != null &&
block.getGenerationStamp() == replicaInfo.getGenerationStamp()) {
return replicaInfo;
}
return null;
}{code}
So I think, In this case, If client-read needs to go through, then need to bump
up the genstamp to latest, like below.
{code} if (replica.getGenerationStamp() < block.getGenerationStamp()) {
throw new IOException("Replica gen stamp < block genstamp, block="
+ block + ", replica=" + replica);
} else if (replica.getGenerationStamp() > block.getGenerationStamp()) {
DataNode.LOG.debug("Bumping up the client provided"
+ " block's genstamp to latest " + replica.getGenerationStamp()
+ " for block " + block);
block.setGenerationStamp(replica.getGenerationStamp());
}{code}
Else needs to throw exception from here itself.
Any thoughts.?
was (Author: vinayrpet):
I think, its just a race between client-read and delete of the block.
Safe option is to null-check and throw ReplicaNotFoundException
{code} // Obtain a reference before reading data
FsVolumeSpi volume = datanode.data.getVolume(block);
if (volume == null) {
// This is race b/n delete and read
throw new ReplicaNotFoundException(block);
}
this.volumeRef = volume.obtainReference();{code}
> NullPointerException in BlockSender
> -----------------------------------
>
> Key: HDFS-7884
> URL: https://issues.apache.org/jira/browse/HDFS-7884
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Brahma Reddy Battula
> Priority: Blocker
> Attachments:
> org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt
>
>
> {noformat}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:264)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> BlockSender.java:264 is shown below
> {code}
> this.volumeRef = datanode.data.getVolume(block).obtainReference();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)