[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263974#comment-13263974
 ] 

Dmitriy V. Ryaboy commented on PIG-2672:
----------------------------------------

This would be a great addition.
Couple of proposed refinements to the design:

1) same behavior should happen on the local client, for cases when users 
register jars from HDFS (no need to copy if a jar with same name+cksum is 
cached locally)
2) the directory should be .pig/jarcache/ or similar
3) we should be very explicit about documenting this behavior, and provide 
management tools for this cache, so people don't get surprised as this cache 
grows progressively bigger in size
4) it could be helpful to have a configurable cluster-level cache, instead or 
in addition to user-level cache, for cases when many users are using the same 
jar. There may be security concerns with that.
                
> Optimize the use of DistributedCache
> ------------------------------------
>
>                 Key: PIG-2672
>                 URL: https://issues.apache.org/jira/browse/PIG-2672
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>    * Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>    * Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to