[
https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063995#comment-17063995
]
Steven Rand commented on HDFS-15191:
------------------------------------
[~vagarychen] I looked at this some more, and found that one difference after
HDFS-14611 is that we call this from {{SaslDataTranserClient#doSaslHandshake}}
in 3.2.1, but not in 3.2.0:
{code}
BlockTokenIdentifier blockTokenIdentifier = accessToken.decodeIdentifier();
{code}
Maybe trying to call {{BlockTokenIdentifier.readFieldsLegacy}} with the legacy
block token would also have failed in 3.2.0, but we don't get there when we try
to read a block.
Also, I used the debugger to look at the block token, and check what position
we're at in the underlying {{DataInputStream}} during each call in
{{BlockTokenIdentifier.readFieldsLegacy}}. All the calls before {{length =
WritableUtils.readVInt(in);}} seem fine, but then we're just out of bytes by
the time we get there.
{code}
# The DataInputStream has 74 bytes in it.
expiryDate = WritableUtils.readVLong(in); # pos = 0
keyId = WritableUtils.readVInt(in); # pos = 7
userId = WritableUtils.readString(in); # pos = 12
blockPoolId = WritableUtils.readString(in); # pos = 21
blockId = WritableUtils.readVLong(in); # pos = 63
int length = WritableUtils.readVIntInRange(in, 0,
AccessMode.class.getEnumConstants().length); # pos = 68
for (int i = 0; i < length; i++) { modes.add(WritableUtils.readEnum(in,
AccessMode.class)); } # pos = 69
length = WritableUtils.readVInt(in); # pos = 74, which is equal to the count,
so we're at the end of the stream
... more code, but we don't get to it ...
{code}
> EOF when reading legacy buffer in BlockTokenIdentifier
> ------------------------------------------------------
>
> Key: HDFS-15191
> URL: https://issues.apache.org/jira/browse/HDFS-15191
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.2.1
> Reporter: Steven Rand
> Priority: Major
>
> We have an HDFS client application which recently upgraded from 3.2.0 to
> 3.2.1. After this upgrade (but not before), we sometimes see these errors
> when this application is used with clusters still running Hadoop 2.x (more
> specifically CDH 5.12.1):
> {code}
> WARN [2020-02-24T00:54:32.856Z]
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing
> remote block reader. (_sampled: true)
> java.io.EOFException:
> at java.io.DataInputStream.readByte(DataInputStream.java:272)
> at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
> at
> org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
> at
> org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
> at
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170)
> at
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730)
> at
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380)
> at
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314)
> at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
> at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
> at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765)
> {code}
> We get this warning for all DataNodes with a copy of the block, so the read
> fails.
> I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to
> cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging
> [~vagarychen] in case you have any ideas.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]