[
https://issues.apache.org/jira/browse/HDFS-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851087#comment-16851087
]
Stephen O'Donnell commented on HDFS-14514:
------------------------------------------
When a ByteBuffer is used to read data into, it is not valid to modify the
limit of the ByteBuffer directly, as the client may be using that. However the
buffer limit is also used to enforce how much data should be read in.
In the short circuit read tests that failed with the v002 patch, the client was
setting the buffer limit to a relatively high value (eg 16k). Then on each read
from the file system, less than that (eg 4K) was read into the buffer.
The code change in patch v002 set the limit of the buffer to 4K and then that
made it back to the client which then believed the buffer was full. This caused
it to stop reading and hence not read all the expected data.
What you need to do instead is:
1. Clone the ByteBuffer, which does not clone the data, just the position and
limit pointers.
2. Set the limit of the cloned buffer as required in the read method.
3. Then perform a read, and the number of bytes read will be returned.
4. Finally update the position of the non-tmp buffer to be origional_position +
read_bytes
This is largely how the code was changed to operate in the refactor of these
methods (HDFS-8905). I have updated a v003 patch in which I got the failing
tests to pass locally with the change described above.
> Actual read size of open file in encryption zone still larger than listing
> size even after enabling HDFS-11402 in Hadoop 2
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14514
> URL: https://issues.apache.org/jira/browse/HDFS-14514
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: encryption, hdfs, snapshots
> Affects Versions: 2.6.5, 2.9.2, 2.8.5, 2.7.7
> Reporter: Siyao Meng
> Assignee: Siyao Meng
> Priority: Major
> Attachments: HDFS-14514.branch-2.001.patch,
> HDFS-14514.branch-2.002.patch, HDFS-14514.branch-2.003.patch
>
>
> In Hadoop 2, when a file is opened for write in *encryption zone*, taken a
> snapshot and appended, the read out file size in the snapshot is larger than
> the listing size. This happens even when immutable snapshot HDFS-11402 is
> enabled.
> Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug
> silently (probably incidentally). Hadoop 2.x are still suffering from this
> issue.
> Thanks [~sodonnell] for locating the root cause in the codebase.
> Repro:
> 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml,
> start HDFS cluster
> 2. Create an empty directory /dataenc, create encryption zone and allow
> snapshot on it
> {code:bash}
> hadoop key create reprokey
> sudo -u hdfs hdfs dfs -mkdir /dataenc
> sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
> sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
> {code}
> 3. Use a client that keeps a file open for write under /dataenc. For example,
> I'm using Flume HDFS sink to tail a local file.
> 4. Append the file several times using the client, keep the file open.
> 5. Create a snapshot
> {code:bash}
> sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
> {code}
> 6. Append the file one or more times, but don't let the file size exceed the
> block size limit. Wait for several seconds for the append to be flushed to DN.
> 7. Do a -ls on the file inside the snapshot, then try to read the file using
> -get, you should see the actual file size read is larger than the listing
> size from -ls.
> The patch and an updated unit test will be uploaded later.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]