[jira] [Comment Edited] (TEZ-4178) Create separate method for repeated logic of ensuring spill file permission

Jonathan Turner Eagles (Jira) Wed, 13 May 2020 07:53:14 -0700


    [ 
https://issues.apache.org/jira/browse/TEZ-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106373#comment-17106373
 ]


Jonathan Turner Eagles edited comment on TEZ-4178 at 5/13/20, 2:52 PM:
-----------------------------------------------------------------------

Overall, I like the idea of this patch. I was looking at the original intent of 
TEZ-3894 to understand this change better. Later, it may be better to collect 
functions like this is a SpillRecordUtils or similar, but for now this is 
sufficient.
- I think it will be better to pass in FsPermissions instead of the conf
- I think it will be better to separate the logic of do the permissions need to 
be set which only need to be calculated once from the action of setting of file 
permissions which need to be done for each file that will be shuffled.

Couple of things unrelated to this patch which could be fixed in subsequent 
jiras.
- The file permissions are unnecessarily in certain scenarios. According to the 
original patch, the shuffle handler requires group read permission. However, 
the code checks for exact permissions instead of group read permissions. 
Perhaps we could be less strict

{code}
FsPermission umask = FsPermission.getUMask(conf);
correctSpillPermisions = 
SPILL_FILE_PERM.applyUMask(umask).getGroupAction().implies(FsAction.READ)
{code}

- We are setting the permissions of non-final spill files. Only output files 
(and corresponding index files) need to have their permissions set.

- Also noted is how slow setting of file permissions in mac is compared to 
linux. Mac is shelling out to bash to set permissions, which is not needed as 
Mac is a posix compliant file system and java has some optimization for that 
use case. A jira should be filed against Hadoop if that is still the case.


was (Author: jeagles):
Overall, I like the idea of this patch as the logic I was looking at the 
original intent of TEZ-3894 to understand this change better. Later, it may be 
better to collect functions like this is a SpillRecordUtils or similar, but for 
now this is sufficient.
- I think it will be better to pass in FsPermissions instead of the conf
- I think it will be better to separate the logic of do the permissions need to 
be set which only need to be calculated once from the action of setting of file 
permissions which need to be done for each file that will be shuffled.

Couple of things unrelated to this patch which could be fixed in subsequent 
jiras.
- The file permissions are unnecessarily in certain scenarios. According to the 
original patch, the shuffle handler requires group read permission. However, 
the code checks for exact permissions instead of group read permissions. 
Perhaps we could be less strict

{code}
FsPermission umask = FsPermission.getUMask(conf);
correctSpillPermisions = 
SPILL_FILE_PERM.applyUMask(umask).getGroupAction().implies(FsAction.READ)
{code}

- We are setting the permissions of non-final spill files. Only output files 
(and corresponding index files) need to have their permissions set.

- Also noted is how slow setting of file permissions in mac is compared to 
linux. Mac is shelling out to bash to set permissions, which is not needed as 
Mac is a posix compliant file system and java has some optimization for that 
use case. A jira should be filed against Hadoop if that is still the case.

> Create separate method for repeated logic of ensuring spill file permission
> ---------------------------------------------------------------------------
>
>                 Key: TEZ-4178
>                 URL: https://issues.apache.org/jira/browse/TEZ-4178
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: TEZ-4178.01.patch, TEZ-4178.02.patch, TEZ-4178.03.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (TEZ-4178) Create separate method for repeated logic of ensuring spill file permission

Reply via email to