Tasks can setup local resources and change the environment (specifically the classpath in this case). That's missing for AMs - where only LocalResources can be specified. An API to add a file to the classpath (including localization) - which works for the AM and tasks would be useful, and there's a jira for this - but hasn't been worked on yet.
On Thu, Jun 18, 2015 at 1:06 PM, Andre Kelpe <[email protected]> wrote: > Hi, > > so I have tried ARCHIVE and added it to > TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX as you suggested. That seems to get > me further. The problem is now, that the same jar should be used in the > containers for the Dags, but that seems to work in a completely different > way. > > We were using PATTERN for those before + a custom environment: > > https://github.com/Cascading/cascading/blob/3.0/cascading-hadoop2-tez/src/main/java/cascading/flow/tez/util/TezUtil.java#L276-L311 > This works, however I don't want to add the same jar twice, once as an > archive and once as a PATTERN. > > I am a bit lost why there are two different ways of doing this for the > various JVMs at various stages. > > - André > > > On Thu, Jun 18, 2015 at 9:57 AM, Hitesh Shah <[email protected]> wrote: > > > Hi Andre > > > > Are you using Local Resource type ARCHIVE? Using FILE may not help in > your > > scenario. > > > > If you are using ARCHIVE, you can then use the classpath config ( > > TEZ_CLUSTER_ADDITIONAL_CLASSPATH_PREFIX ) to modify the classpath. > > > > For example, assume foo.jar and bar.jar ( in the structure that you > > called out ) are added to the map of local resources using keys foo and > bar: > > - classpath prefix would be > > “$PWD/foo/*:$PWD/foo/lib/*:$PWD/bar/*:$PWD/bar/lib/*:” > > > > As mentioned on the jira, the launch_container.sh from your cluster would > > help. Also, if you upload an example jar to the jira, I can help provide > a > > working example. > > > > thanks > > — Hitesh > > > > > > On Jun 18, 2015, at 9:40 AM, Andre Kelpe <[email protected]> > wrote: > > > > > On Wed, Jun 17, 2015 at 4:58 PM, Bikas Saha <[email protected]> > > wrote: > > > > > >> If I understand this right, there is a jar with user code in it. The > jar > > >> needs to be available during split creation but it is not available. > > >> > > >> > > >> > > >> Is split creation happening on the client or on the AM. If its > happening > > >> on the AM, and the AM is not getting the jars then how are you > > specifying > > >> the jars to be sent to the AM. There are different ways to do it. > > >> > > > > > > In our case the AM is doing the split calculation. We are sending the > jar > > > over as LocalResources given in the TezClient#create method > > > > > > > > >> 1) Set tez.aux.uris in tez-site.xml to an HDFS location and copy > > >> user jars there > > >> > > >> 2) Upload the user jar to HDFS and create a YARN local resource > for > > >> it. Then use either of the following to add the local resource to the > > >> AM/DAG that needs it. > > >> > > >> a. TezClient#addAppMasterLocalFiles(…) > > >> > > >> b. DAG#addTaskLocalFiles(…) > > >> > > >> > > >> > > >> Not sure what is meant by classic Hadoop style jars? > > >> > > > > > > Hadoop style jars are jar files, where you have the user code + all > > > required libs in a sub-directory within the jar. The layout that RunJar > > > understands since forever. > > > > > > The thing is that we can't find a way to put the jars in the lib folder > > in > > > the job-jar on the classpath of the AM. > > > > > > - André > > > > > > > > > > > >> > > >> > > >> Bikas > > >> > > >> > > >> > > >> *From:* Chris K Wensel [mailto:[email protected]] > > >> *Sent:* Wednesday, June 17, 2015 4:41 PM > > >> *To:* [email protected] > > >> *Cc:* [email protected] > > >> *Subject:* Re: ClassNotFoundException with custom InputFormat. > > >> > > >> > > >> > > >> cross posting down to dev… should continue the discussion there I > > believe. > > >> > > >> > > >> > > >> as I understand it, all Cascading users familiar with packaging a > Hadoop > > >> job jar with a lib folder, in which the packaged custom InputFormat is > > >> placed — pulled from maven etc, will have this issue. > > >> > > >> > > >> > > >> this also expands to projects on top of Cascading including Scalding > and > > >> Cascalog. > > >> > > >> > > >> > > >> oddly the org.apache.tez.client.AMConfiguration has a > > >> > > >> > > >> > > >> private Map<String, String> env; > > >> > > >> > > >> > > >> but is unused. > > >> > > >> > > >> > > >> On Jun 17, 2015, at 4:32 PM, Andre Kelpe <[email protected]> > > >> wrote: > > >> > > >> > > >> > > >> Hi, > > >> > > >> we are currently running into a problem when a user of Cascading uses > a > > >> custom InputFormat with Tez. The ApplicationMaster is running into a > > >> ClassNotFoundException when calculating the splits, since we are > unable > > to > > >> control the environment/classpath visibile to the ApplicationMaster. > We > > >> have a work-around, where the users have to supply a fat-jar to make > it > > >> work, but we need to be able to support other ways as well. > > >> > > >> When interacting with the DAG, we are able to pass along a custom > > >> environment/classpath, but that API is missing on the TezClient, > causing > > >> the AppMaster to fail, when the user is using classic hadoop style > jars > > >> (embedded lib directory). > > >> > > >> In order to get lingual, our SQL layer on top of Cascading to work > > >> correctly, we need a way to supply the environment in a more dynamic > way > > >> then one fatjar, so it would be great if the API could be extendend to > > do > > >> that. > > >> > > >> I have opened https://issues.apache.org/jira/browse/TEZ-2563 > > >> > > >> Thanks! > > >> > > >> > > >> > > >> - André > > >> > > >> > > >> -- > > >> > > >> André Kelpe > > >> [email protected] > > >> http://concurrentinc.com > > >> > > >> > > >> > > >> — > > >> > > >> Chris K Wensel > > >> > > >> [email protected] > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > > > > > > > > > > > -- > > > André Kelpe > > > [email protected] > > > http://concurrentinc.com > > > > > > > -- > André Kelpe > [email protected] > http://concurrentinc.com >
