[
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779236#comment-13779236
]
Jason Lowe commented on PIG-2672:
---------------------------------
bq. I'm deliberately avoiding in permission checks in this code path. In terms
of security, I feel that this is no worse than what we have right now.
A shared cache where anyone can write is indeed worse. Today jars are being
uploaded to HDFS into a private staging directory where no other normal user
can interfere. If the staging directory were to become publicly writeable then
it becomes trivial to compromise all users trying to run the same pig jar using
a scheme like [~knoguchi] pointed out. I don't see how one can accomplish the
same level of havoc today. Even if there's a window in the local filesystem
where one can hijack a jar, that requires access to the same node where the
user is launching the job. In the publicly-writeable shared cache scheme, one
only needs access to HDFS from any node and clients on all nodes using the
shared cache can be compromised.
Besides malicious users, the shared cache can also be accidentally made
ineffective by clients. For example, a user with a restrictive umask (e.g.:
077) uploads a jar to the shared cache, and all the directories and files were
created such that others can't read them. Now because the permissions are
incorrect any other user can't share the file and any other user's file that
happens to have the same initial digit(s) in its hash can't be uploaded to the
shared cache. And then there's the client that deletes files in-use by other
clients, breaking their jobs.
In short, shared public caches that are publicly writeable are going to be
problematic, especially in secure setups. As such I think there should at
least be some documentation describing the risks of enabling it and how it
could be used in a read-only manner for sharing securely, i.e.: shared cache is
publicly readable but only writeable by admins who manually maintain the
entries in the shared cache.
> Optimize the use of DistributedCache
> ------------------------------------
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds
> them to DistributedCache for each job launched. This is inefficient in terms
> of
> * Space - The jars are distributed to task trackers for every job taking
> up lot of local temporary space in tasktrackers.
> * Performance - The jar distribution impacts the job launch time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira