Siyao Meng created HDFS-14514:
---------------------------------

             Summary: Actual read size of open file in encryption zone still 
larger than listing size even after enabling HDFS-11402 in Hadoop 2
                 Key: HDFS-14514
                 URL: https://issues.apache.org/jira/browse/HDFS-14514
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs, snapshots
    Affects Versions: 2.7.7, 2.8.5, 2.9.2, 2.6.5
            Reporter: Siyao Meng
            Assignee: Siyao Meng


In Hadoop 2, when a file is opened for write in *encryption zone*, taken a 
snapshot and appended, the read out file size in the snapshot is larger than 
the listing size. This happens even when immutable snapshot HDFS-11402 is 
enabled.

Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug 
silently (probably incidentally). Hadoop 2.x are still suffering from this 
issue.

Thanks [~sodonnell] for locating the root cause in the codebase.

Repro:
1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, start 
HDFS cluster
2. Create an empty directory /dataenc, create encryption zone and allow 
snapshot on it
{code:bash}
hadoop key create reprokey
sudo -u hdfs hdfs dfs -mkdir /dataenc
sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc
sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc
{code}
3. Use a client that keeps a file open for write under /dataenc. For example, 
I'm using Flume HDFS sink to tail a local file.
4. Append the file several times using the client, keep the file open.
5. Create a snapshot
{code:bash}
sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1
{code}
6. Append the file one or more times, but don't let the file size exceed the 
block size limit. Wait for several seconds for the append to be flushed to DN.
7. Do a -ls on the file inside the snapshot, then try to read the file using 
-get, you should see the actual file size read is larger than the listing size 
from -ls.

The patch and an updated unit test will be uploaded later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to