[
https://issues.apache.org/jira/browse/TEZ-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106373#comment-17106373
]
Jonathan Turner Eagles edited comment on TEZ-4178 at 5/13/20, 2:52 PM:
-----------------------------------------------------------------------
Overall, I like the idea of this patch. I was looking at the original intent of
TEZ-3894 to understand this change better. Later, it may be better to collect
functions like this is a SpillRecordUtils or similar, but for now this is
sufficient.
- I think it will be better to pass in FsPermissions instead of the conf
- I think it will be better to separate the logic of do the permissions need to
be set which only need to be calculated once from the action of setting of file
permissions which need to be done for each file that will be shuffled.
Couple of things unrelated to this patch which could be fixed in subsequent
jiras.
- The file permissions are unnecessarily in certain scenarios. According to the
original patch, the shuffle handler requires group read permission. However,
the code checks for exact permissions instead of group read permissions.
Perhaps we could be less strict
{code}
FsPermission umask = FsPermission.getUMask(conf);
correctSpillPermisions =
SPILL_FILE_PERM.applyUMask(umask).getGroupAction().implies(FsAction.READ)
{code}
- We are setting the permissions of non-final spill files. Only output files
(and corresponding index files) need to have their permissions set.
- Also noted is how slow setting of file permissions in mac is compared to
linux. Mac is shelling out to bash to set permissions, which is not needed as
Mac is a posix compliant file system and java has some optimization for that
use case. A jira should be filed against Hadoop if that is still the case.
was (Author: jeagles):
Overall, I like the idea of this patch as the logic I was looking at the
original intent of TEZ-3894 to understand this change better. Later, it may be
better to collect functions like this is a SpillRecordUtils or similar, but for
now this is sufficient.
- I think it will be better to pass in FsPermissions instead of the conf
- I think it will be better to separate the logic of do the permissions need to
be set which only need to be calculated once from the action of setting of file
permissions which need to be done for each file that will be shuffled.
Couple of things unrelated to this patch which could be fixed in subsequent
jiras.
- The file permissions are unnecessarily in certain scenarios. According to the
original patch, the shuffle handler requires group read permission. However,
the code checks for exact permissions instead of group read permissions.
Perhaps we could be less strict
{code}
FsPermission umask = FsPermission.getUMask(conf);
correctSpillPermisions =
SPILL_FILE_PERM.applyUMask(umask).getGroupAction().implies(FsAction.READ)
{code}
- We are setting the permissions of non-final spill files. Only output files
(and corresponding index files) need to have their permissions set.
- Also noted is how slow setting of file permissions in mac is compared to
linux. Mac is shelling out to bash to set permissions, which is not needed as
Mac is a posix compliant file system and java has some optimization for that
use case. A jira should be filed against Hadoop if that is still the case.
> Create separate method for repeated logic of ensuring spill file permission
> ---------------------------------------------------------------------------
>
> Key: TEZ-4178
> URL: https://issues.apache.org/jira/browse/TEZ-4178
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Attachments: TEZ-4178.01.patch, TEZ-4178.02.patch, TEZ-4178.03.patch
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)