[jira] [Commented] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier

Steven Rand (Jira) Sat, 21 Mar 2020 10:14:46 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063995#comment-17063995
 ]


Steven Rand commented on HDFS-15191:
------------------------------------

[~vagarychen] I looked at this some more, and found that one difference after 
HDFS-14611 is that we call this from {{SaslDataTranserClient#doSaslHandshake}} 
in 3.2.1, but not in 3.2.0:

{code}
BlockTokenIdentifier blockTokenIdentifier = accessToken.decodeIdentifier();
{code}

Maybe trying to call {{BlockTokenIdentifier.readFieldsLegacy}} with the legacy 
block token would also have failed in 3.2.0, but we don't get there when we try 
to read a block.

Also, I used the debugger to look at the block token, and check what position 
we're at in the underlying {{DataInputStream}} during each call in 
{{BlockTokenIdentifier.readFieldsLegacy}}. All the calls before {{length = 
WritableUtils.readVInt(in);}} seem fine, but then we're just out of bytes by 
the time we get there. 

{code}
# The DataInputStream has 74 bytes in it.
expiryDate = WritableUtils.readVLong(in); # pos = 0
keyId = WritableUtils.readVInt(in); # pos = 7
userId = WritableUtils.readString(in); # pos = 12
blockPoolId = WritableUtils.readString(in); # pos = 21
blockId = WritableUtils.readVLong(in); # pos = 63
int length = WritableUtils.readVIntInRange(in, 0, 
AccessMode.class.getEnumConstants().length); # pos = 68
for (int i = 0; i < length; i++) { modes.add(WritableUtils.readEnum(in, 
AccessMode.class)); } # pos = 69

length = WritableUtils.readVInt(in); # pos = 74, which is equal to the count, 
so we're at the end of the stream
... more code, but we don't get to it ...
{code}

> EOF when reading legacy buffer in BlockTokenIdentifier
> ------------------------------------------------------
>
>                 Key: HDFS-15191
>                 URL: https://issues.apache.org/jira/browse/HDFS-15191
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.2.1
>            Reporter: Steven Rand
>            Priority: Major
>
> We have an HDFS client application which recently upgraded from 3.2.0 to 
> 3.2.1. After this upgrade (but not before), we sometimes see these errors 
> when this application is used with clusters still running Hadoop 2.x (more 
> specifically CDH 5.12.1):
> {code}
> WARN  [2020-02-24T00:54:32.856Z] 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing 
> remote block reader. (_sampled: true)
> java.io.EOFException:
>         at java.io.DataInputStream.readByte(DataInputStream.java:272)
>         at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240)
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221)
>         at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170)
>         at 
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730)
>         at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
>         at java.io.DataInputStream.read(DataInputStream.java:100)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314)
>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
>         at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765)
> {code}
> We get this warning for all DataNodes with a copy of the block, so the read 
> fails.
> I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to 
> cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging 
> [~vagarychen] in case you have any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier

Reply via email to