[jira] [Commented] (PIG-2318) Push extra jars to distributed cache and use the classloader enxtension mechanism in PigContext to load them on the backend

Dmitriy V. Ryaboy (Commented) (JIRA) Thu, 13 Oct 2011 12:09:37 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126817#comment-13126817
 ]


Dmitriy V. Ryaboy commented on PIG-2318:
----------------------------------------

Good start, Julien. 

public void setExtraJarsInDistributedCache -- seems like we'll need an 
additional addJarToDistributedCache method, to avoid forcing function users to 
rewrite the array every time themselves.

log.info("Adding jar to DistributedCache: " + jar); -- this should be debug 
level

You special case file and hdfs protocols specifically. What happens to other 
protocols? I believe the way we deal with hdfs jars is we copy them over to the 
local fs in order to drop them onto the local classpath, anyway. Presumably 
that'll work the same way for s3n://, for example. We could at least treat 
those as local jars and pick up the local copy to ship to the cluster?

Not related to your change, but any ideas why skipJars is a Vector? Seems like 
that's not necessary...

Some of your comments such as the one about the PigContext constructor should 
probably be in javadoc format so they make it out to the world.. knowing what 
does and does not get serialized would be handy.

Did you notice an appreciable improvement in startup time when using this on 
our cluster?

                
> Push extra jars to distributed cache and use the classloader enxtension 
> mechanism in PigContext to load them on the backend
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2318
>                 URL: https://issues.apache.org/jira/browse/PIG-2318
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Julien Le Dem
>            Assignee: Julien Le Dem
>         Attachments: PIG-2318.patch
>
>
> This is related to PIG-2010 with a slightly different approach
> https://issues.apache.org/jira/browse/PIG-2010
> Currently Pig bundles up all dependencies in a single jar which is a lot of 
> overhead when there are a lot of dependencies and short lived jobs. This 
> patch instead pushes the dependencies to distributed cache and uses the 
> PigContext classloading mechanism to make the UDFs available.
> Possible improvements: push jars to HDFS/distributed cache only once per 
> script. have a cache on HDFS to avoid repeatedly pushing jars to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2318) Push extra jars to distributed cache and use the classloader enxtension mechanism in PigContext to load them on the backend

Reply via email to