+1. Sending the jar as archive will cause it to be unjarred and then you could specify the classpath mods by referring to the unjarred files.
At this point, perhaps in Tez, we should consider creating 2 dirs - tez and user and localize files in them appropriately. This would separate jars and help debugging cases where jars are duplicated in both because they wont over-write each other. -----Original Message----- From: Hitesh Shah [mailto:[email protected]] Sent: Thursday, June 18, 2015 9:57 AM To: [email protected] Subject: Re: ClassNotFoundException with custom InputFormat. Hi Andre Are you using Local Resource type ARCHIVE? Using FILE may not help in your scenario. If you are using ARCHIVE, you can then use the classpath config ( TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX ) to modify the classpath. For example, assume foo.jar and bar.jar ( in the structure that you called out ) are added to the map of local resources using keys foo and bar: - classpath prefix would be "$PWD/foo/*:$PWD/foo/lib/*:$PWD/bar/*:$PWD/bar/lib/*:" As mentioned on the jira, the launch_container.sh from your cluster would help. Also, if you upload an example jar to the jira, I can help provide a working example. thanks - Hitesh On Jun 18, 2015, at 9:40 AM, Andre Kelpe <[email protected]> wrote: > On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <[email protected]> wrote: > >> If I understand this right, there is a jar with user code in it. The >> jar needs to be available during split creation but it is not available. >> >> >> >> Is split creation happening on the client or on the AM. If its >> happening on the AM, and the AM is not getting the jars then how are >> you specifying the jars to be sent to the AM. There are different ways to do >> it. >> > > In our case the AM is doing the split calculation. We are sending the > jar over as LocalResources given in the TezClient#create method > > >> 1) Set tez.aux.uris in tez-site.xml to an HDFS location and copy >> user jars there >> >> 2) Upload the user jar to HDFS and create a YARN local resource for >> it. Then use either of the following to add the local resource to the >> AM/DAG that needs it. >> >> a. TezClient#addAppMasterLocalFiles(.) >> >> b. DAG#addTaskLocalFiles(.) >> >> >> >> Not sure what is meant by classic Hadoop style jars? >> > > Hadoop style jars are jar files, where you have the user code + all > required libs in a sub-directory within the jar. The layout that > RunJar understands since forever. > > The thing is that we can't find a way to put the jars in the lib > folder in the job-jar on the classpath of the AM. > > - André > > > >> >> >> Bikas >> >> >> >> *From:* Chris K Wensel [mailto:[email protected]] >> *Sent:* Wednesday, June 17, 2015 4:41 PM >> *To:* [email protected] >> *Cc:* [email protected] >> *Subject:* Re: ClassNotFoundException with custom InputFormat. >> >> >> >> cross posting down to dev. should continue the discussion there I believe. >> >> >> >> as I understand it, all Cascading users familiar with packaging a >> Hadoop job jar with a lib folder, in which the packaged custom >> InputFormat is placed - pulled from maven etc, will have this issue. >> >> >> >> this also expands to projects on top of Cascading including Scalding >> and Cascalog. >> >> >> >> oddly the org.apache.tez.client.AMConfiguration has a >> >> >> >> private Map<String, String> env; >> >> >> >> but is unused. >> >> >> >> On Jun 17, 2015, at 4:32 PM, Andre Kelpe <[email protected]> >> wrote: >> >> >> >> Hi, >> >> we are currently running into a problem when a user of Cascading uses >> a custom InputFormat with Tez. The ApplicationMaster is running into >> a ClassNotFoundException when calculating the splits, since we are >> unable to control the environment/classpath visibile to the >> ApplicationMaster. We have a work-around, where the users have to >> supply a fat-jar to make it work, but we need to be able to support other >> ways as well. >> >> When interacting with the DAG, we are able to pass along a custom >> environment/classpath, but that API is missing on the TezClient, >> causing the AppMaster to fail, when the user is using classic hadoop >> style jars (embedded lib directory). >> >> In order to get lingual, our SQL layer on top of Cascading to work >> correctly, we need a way to supply the environment in a more dynamic >> way then one fatjar, so it would be great if the API could be >> extendend to do that. >> >> I have opened https://issues.apache.org/jira/browse/TEZ-2563 >> >> Thanks! >> >> >> >> - André >> >> >> -- >> >> André Kelpe >> [email protected] >> http://concurrentinc.com >> >> >> >> - >> >> Chris K Wensel >> >> [email protected] >> >> >> >> >> >> >> > > > > -- > André Kelpe > [email protected] > http://concurrentinc.com
