[jira] [Comment Edited] (HDFS-11197) Listing encryption zones fails when deleting a EZ that is on a snapshotted directory

Wei-Chiu Chuang (JIRA) Wed, 07 Dec 2016 10:02:18 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-11197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729387#comment-15729387
 ]


Wei-Chiu Chuang edited comment on HDFS-11197 at 12/7/16 6:01 PM:
-----------------------------------------------------------------

That fails because of this error:
{noformat}
2016-12-07 00:51:50,706 [IPC Server handler 6 on 59757] INFO  ipc.Server 
(Server.java:logException(2697)) - IPC Server handler 6 on 59757, call Call#819 
Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
127.0.0.1:42394
org.apache.hadoop.hdfs.server.namenode.RetryStartFileException: Preconditions 
for creating a file failed because of a transient error, retry create later.
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirEncryptionZoneOp.getFileEncryptionInfo(FSDirEncryptionZoneOp.java:330)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2249)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2175)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:742)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:420)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2653)
{noformat}
I have seen the exact same test error in the past, so most likely it's not your 
patch that caused it. [~xiaochen] filed HDFS-11093 for this previously.


was (Author: jojochuang):
That fails because of this error:
{noformat}
2016-12-07 00:51:50,706 [IPC Server handler 6 on 59757] INFO  ipc.Server 
(Server.java:logException(2697)) - IPC Server handler 6 on 59757, call Call#819 
Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
127.0.0.1:42394
org.apache.hadoop.hdfs.server.namenode.RetryStartFileException: Preconditions 
for creating a file failed because of a transient error, retry create later.
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirEncryptionZoneOp.getFileEncryptionInfo(FSDirEncryptionZoneOp.java:330)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2249)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2175)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:742)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:420)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2653)
{noformat}
I have seen the exact same test error in the past, so most likely it's not your 
patch that caused it. I can file a jira if there's no one filed previously.

> Listing encryption zones fails when deleting a EZ that is on a snapshotted 
> directory
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-11197
>                 URL: https://issues.apache.org/jira/browse/HDFS-11197
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-11197-1.patch, HDFS-11197-2.patch, 
> HDFS-11197-3.patch, HDFS-11197-4.patch, HDFS-11197-5.patch, 
> HDFS-11197-6.patch, HDFS-11197-7.patch
>
>
> If a EZ directory is under a snapshotable directory, and a snapshot has been 
> taking, then if this EZ is permanently deleted, it causes *hdfs crypto 
> listZones* command to fail without showing any of the still available zones.
> This happens only after the EZ is removed from Trash folder. For example, 
> considering */test-snap* folder is snapshotable and there is already an 
> snapshot for it:
> {noformat}
> $ hdfs crypto -listZones
> /user/systest           my-key
> /test-snap/EZ-1       my-key
> $ hdfs dfs -rmr /test-snap/EZ-1
> INFO fs.TrashPolicyDefault: Moved: 'hdfs://ns1/test-snap/EZ-1' to trash at: 
> hdfs://ns1/user/hdfs/.Trash/Current/test-snap/EZ-1
> $ hdfs crypto -listZones
> /user/systest           my-key
> /user/hdfs/.Trash/Current/test-snap/EZ-1  my-key 
> $ hdfs dfs -rmr /user/hdfs/.Trash/Current/test-snap/EZ-1
> Deleted /user/hdfs/.Trash/Current/test-snap/EZ-1
> $ hdfs crypto -listZones
> RemoteException: Absolute path required
> {noformat}
> Once this error happens, *hdfs crypto -listZones* only works again if we 
> remove the snapshot:
> {noformat}
> $ hdfs dfs -deleteSnapshot /test-snap snap1
> $ hdfs crypto -listZones
> /user/systest           my-key
> {noformat}
> If we instead delete the EZ using *skipTrash* option, *hdfs crypto 
> -listZones* does not break:
> {noformat}
> $ hdfs crypto -listZones
> /user/systest           my-key
> /test-snap/EZ-2  my-key
> $ hdfs dfs -rmr -skipTrash /test-snap/EZ-2
> Deleted /test-snap/EZ-2
> $ hdfs crypto -listZones
> /user/systest           my-key
> {noformat}
> The different behaviour seems to be because when removing the EZ trash 
> folder, it's related INode is left with no parent INode. This causes 
> *EncryptionZoneManager.listEncryptionZones* to throw the seen error, when 
> trying to resolve the inodes in the given path.
> Am proposing a patch that fixes this issue by simply performing an additional 
> check on *EncryptionZoneManager.listEncryptionZones* for the case an inode 
> has no parent, so that it would be skipped on the list without trying to 
> resolve it. Feedback on the proposal is appreciated. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-11197) Listing encryption zones fails when deleting a EZ that is on a snapshotted directory

Reply via email to