[ https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125380#comment-14125380 ]
Steve Loughran commented on HDFS-6843: -------------------------------------- Regarding the specification, I think you've dropped back to english too early, which leaves ambiguities around —exactly the kind of thing we are trying to avoid Here's my proposal # we declare a new {{inEncryptionZone(FS, path)}} predicate saying "the FS implementation declares this to be encrypted" # we add an invariant saying all files and dirs under a directory that is in an encryption zone are also in an encryption zone # the value {{FileStatus.isEncrypted}} for a path is the value of {{inEncryptionZone(FS, path)}}. In the specification, the extra bits (not placed into the source itself), are: {code} ### `inEncryptionZone(FS, path)` A predicate which returns true iff the encryption mechanism of the FS implementation has marked the path as encrypted, and for any file for which `inEncryptionZone(FS, path)` holds the contents of `data(FS, path)` is encrypted such that only holders of the keys may read the unencrypted data. ### `isEncrypted(data)` is a predicate which returns true if the data itself is encrypted. The nature of the encryption, and the mechanism for creating an encryption zone are implementation details not covered in this specification. No guarantees are therefore made as to the quality of the encryption. ### Invariant 1: all files and directories under a directory in an encryption zone are also in an encryption zone forall d in directories(FS): inEncyptionZone(FS, d) implies forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) : inEncyptionZone(FS, c) ### Invariant 2: for all files in an encrypted zone, the data is (somehow encrypted) forall f in files(FS) where inEncyptionZone(FS, c): isEncrypted(data(f)) And for the FileStatus definition FileStatus.isEncrypted() = inEncryptionZone(FS, path) {code} This leaves open the question of whether a file can be marked as encrypted in a directory which is *not* in an encryption zone. If it is a requirement, then that can be added as a new invariant (for all files , {{inEncryptionZone(FS, path)}} implies that the predicate also holds for their parent (which must be a directory, obviously). I don't know if that invariant should be defined, as on NTFS (and perhaps other native filesystems) you can tag an individual file as encrypted —something which the specification above does implicitly permit. It also assumes that the metadata of a file is not encrypted. Is this the case for HDFS? That is, you can list the dir and any attached data on a file other than the contents of {{data(FS, path}}? Invariant 2 covers this by declaring that if a file is in the zone, it is its data which is encrypted. > Create FileStatus isEncrypted() method > -------------------------------------- > > Key: HDFS-6843 > URL: https://issues.apache.org/jira/browse/HDFS-6843 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, security > Affects Versions: 3.0.0 > Reporter: Charles Lamb > Assignee: Charles Lamb > Attachments: HDFS-6843.001.patch, HDFS-6843.002.patch, > HDFS-6843.003.patch, HDFS-6843.004.patch, HDFS-6843.005.patch, > HDFS-6843.005.patch > > > FileStatus should have a 'boolean isEncrypted()' method. (it was in the > context of discussing with AndreW about FileStatus being a Writable). > Having this method would allow MR JobSubmitter do the following: > ----- > BOOLEAN intermediateEncryption = false > IF jobconf.contains("mr.intermidate.encryption") THEN > intermediateEncryption = jobConf.getBoolean("mr.intermidate.encryption") > ELSE > IF (I/O)Format INSTANCEOF File(I/O)Format THEN > intermediateEncryption = ANY File(I/O)Format HAS a Path with status > isEncrypted()==TRUE > FI > jobConf.setBoolean("mr.intermidate.encryption", intermediateEncryption) > FI -- This message was sent by Atlassian JIRA (v6.3.4#6332)