[
https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850047#comment-16850047
]
Stephen O'Donnell edited comment on HDFS-14514 at 5/28/19 7:09 PM:
-------------------------------------------------------------------
I believe it only happens in snapshots, as that is the only place where the
file length recorded in the namenode matters. In the snapshot, the length
should be frozen, but outside of a snapshot, the reader will just read all
available data based on the visible length returned from the datanode.
I have not yet traced why this only impacts CryptoInputStream - the bug appears
to be in ByteBufferStrategy, which is a static class within DFSInputStream, so
it is probably related to the ReaderStrategy used by the calling class. I need
to look a bit further to see why non-encrypted data is not affected.
I agree that these test failures seem related. I will have a further look at
them too.
was (Author: sodonnell):
I believe it only happens in snapshots due to, as that is the only place where
the file length recorded in the namenode matters. In the snapshot, the length
should be frozen, but outside of a snapshot, the reader will just read all
available data based on the visible length returned from the datanode.
I have not yet traced why this only impacts CryptoInputStream - the bug appears
to be in ByteBufferStrategy, which is a static class within DFSInputStream, so
it is probably related to the ReaderStrategy used by the calling class. I need
to look a bit further to see why non-encrypted data is not affected.
I agree that these test failures seem related. I will have a further look at
them too.
> Actual read size of open file in encryption zone still larger than listing
> size even after enabling HDFS-11402 in Hadoop 2
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14514
> URL: https://issues.apache.org/jira/browse/HDFS-14514
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: encryption, hdfs, snapshots
> Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Major
> Attachments: HDFS-14514.branch-2.001.patch
>
>
> In Hadoop 2, when a file is opened for write in *encryption zone*, taken a
> snapshot and appended, the read out file size in the snapshot is larger than
> the listing size. This happens even when immutable snapshot HDFS-11402 is
> enabled.
> Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug
> silently (probably incidentally). Hadoop 2.x are still suffering from this
> issue.
> Thanks [~sodonnell] for locating the root cause in the codebase.
> Repro:
> 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml,
> start HDFS cluster
> 2. Create an empty directory /dataenc, create encryption zone and allow
> snapshot on it
> {code:bash}
> hadoop key create reprokey
> sudo -u hdfs hdfs dfs -mkdir /dataenc
> sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
> sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
> {code}
> 3. Use a client that keeps a file open for write under /dataenc. For example,
> I'm using Flume HDFS sink to tail a local file.
> 4. Append the file several times using the client, keep the file open.
> 5. Create a snapshot
> {code:bash}
> sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
> {code}
> 6. Append the file one or more times, but don't let the file size exceed the
> block size limit. Wait for several seconds for the append to be flushed to DN.
> 7. Do a -ls on the file inside the snapshot, then try to read the file using
> -get, you should see the actual file size read is larger than the listing
> size from -ls.
> The patch and an updated unit test will be uploaded later.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]