On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <[email protected]> wrote:
> If I understand this right, there is a jar with user code in it. The jar > needs to be available during split creation but it is not available. > > > > Is split creation happening on the client or on the AM. If its happening > on the AM, and the AM is not getting the jars then how are you specifying > the jars to be sent to the AM. There are different ways to do it. > In our case the AM is doing the split calculation. We are sending the jar over as LocalResources given in the TezClient#create method > 1) Set tez.aux.uris in tez-site.xml to an HDFS location and copy > user jars there > > 2) Upload the user jar to HDFS and create a YARN local resource for > it. Then use either of the following to add the local resource to the > AM/DAG that needs it. > > a. TezClient#addAppMasterLocalFiles(…) > > b. DAG#addTaskLocalFiles(…) > > > > Not sure what is meant by classic Hadoop style jars? > Hadoop style jars are jar files, where you have the user code + all required libs in a sub-directory within the jar. The layout that RunJar understands since forever. The thing is that we can't find a way to put the jars in the lib folder in the job-jar on the classpath of the AM. - André > > > Bikas > > > > *From:* Chris K Wensel [mailto:[email protected]] > *Sent:* Wednesday, June 17, 2015 4:41 PM > *To:* [email protected] > *Cc:* [email protected] > *Subject:* Re: ClassNotFoundException with custom InputFormat. > > > > cross posting down to dev… should continue the discussion there I believe. > > > > as I understand it, all Cascading users familiar with packaging a Hadoop > job jar with a lib folder, in which the packaged custom InputFormat is > placed — pulled from maven etc, will have this issue. > > > > this also expands to projects on top of Cascading including Scalding and > Cascalog. > > > > oddly the org.apache.tez.client.AMConfiguration has a > > > > private Map<String, String> env; > > > > but is unused. > > > > On Jun 17, 2015, at 4:32 PM, Andre Kelpe <[email protected]> > wrote: > > > > Hi, > > we are currently running into a problem when a user of Cascading uses a > custom InputFormat with Tez. The ApplicationMaster is running into a > ClassNotFoundException when calculating the splits, since we are unable to > control the environment/classpath visibile to the ApplicationMaster. We > have a work-around, where the users have to supply a fat-jar to make it > work, but we need to be able to support other ways as well. > > When interacting with the DAG, we are able to pass along a custom > environment/classpath, but that API is missing on the TezClient, causing > the AppMaster to fail, when the user is using classic hadoop style jars > (embedded lib directory). > > In order to get lingual, our SQL layer on top of Cascading to work > correctly, we need a way to supply the environment in a more dynamic way > then one fatjar, so it would be great if the API could be extendend to do > that. > > I have opened https://issues.apache.org/jira/browse/TEZ-2563 > > Thanks! > > > > - André > > > -- > > André Kelpe > [email protected] > http://concurrentinc.com > > > > — > > Chris K Wensel > > [email protected] > > > > > > > -- André Kelpe [email protected] http://concurrentinc.com
