I wonder whether we can use the service registry aspect of zookeeper to
help in this at all.


On Thu, Sep 5, 2013 at 8:43 AM, Kevin Minder
<[email protected]>wrote:

> This has always been the case.  It just surfaced again for me during
> testing because Sandbox changed the port on which either job-tracker or
> name-node are on (can't remember which) and required that the host name be
> fully qualified.  No idea why localhost stopped working like it was before.
>
>
> On 9/5/13 7:59 AM, larry mccay wrote:
>
>> I can't think of any way to justify doing it in Knox.
>> The added complexity and just plain strangeness of requiring users to knox
>> and configure make it nonstarter in my mind.
>> I was even thinking that maybe we could query the cluster for it somehow
>> but that would still violate our no internals leaking rule.
>>
>> Oozie has to be changed.
>> Did something change recently to introduce this?
>>
>>
>> On Thu, Sep 5, 2013 at 12:21 AM, Kevin Minder
>> <[email protected]>**wrote:
>>
>>  Hi Everyone,
>>>
>>> There is an unresolved issue with Knox Gateway fronting Oozie.  I just
>>> wanted to raise it again for everyone new.  Take a look at the Knox DSL
>>> sample for submitting a workflow gateway-release/home/samples/***
>>> *ExampleSubmitWorkflow.groovy.
>>>
>>>   I included it below for convenience.  Note the jobTracker and nameNode
>>> variables defined.  They are used to populate <workflow-app> and
>>> <configuration> templates that are eventually written to HDFS as files.
>>>
>>> Currently we do not rewrite these values so that client needs to know the
>>> internal structure of the cluster to submit an Oozie workflow via Knox
>>> Gateway.  This goes against one of our fundamental selling points for
>>> Knox.
>>>
>>> There are two reasons for this:
>>>
>>> 1. We don't really want to be parsing every XML files that goes into
>>>     HDFS to look for and change things.  The gateway does support this
>>>     if it weren't for #2.
>>> 2. Currently Knox Gateway doesn't know anything about the RPC ports for
>>>     Hadoop services which is what these values specify.
>>>
>>> The question is should we ask Ozzie to do something about this or add
>>> more
>>> complexity to Knox Gateway to solve it.  My personal vote is to have
>>> Oozie
>>> have defaults for the host:port for job-tracker and name-node in their
>>> config and use relative values for oozie.wf.application.path like is done
>>> with the <arg/>s.
>>>
>>> Also note that the jira that tracks this issue is
>>> KNOX-50: Ensure that all cluster topology details are rewritten for Oozie
>>> REST APIs
>>>
>>> What does everyone think?
>>>
>>> Kevin.
>>>
>>>
>>> import com.jayway.jsonpath.JsonPath
>>> import org.apache.hadoop.gateway.****shell.Hadoop
>>> import org.apache.hadoop.gateway.****shell.hdfs.Hdfs
>>> import org.apache.hadoop.gateway.****shell.workflow.Workflow
>>>
>>> import static java.util.concurrent.TimeUnit.****SECONDS
>>>
>>> gateway = 
>>> "https://localhost:8443/****gateway/sample<https://localhost:8443/**gateway/sample>
>>> <https://**localhost:8443/gateway/sample<https://localhost:8443/gateway/sample>
>>> >
>>>
>>> "
>>> jobTracker = "sandbox.hortonworks.com:8050"
>>> nameNode = "sandbox.hortonworks.com:8020"
>>> username = "hue"
>>> password = "hue-password"
>>> inputFile = "LICENSE"
>>> jarFile = "samples/hadoop-examples.jar"
>>>
>>> definition = """\
>>> <workflow-app xmlns="uri:oozie:workflow:0.2" name="wordcount-workflow">
>>>      <start to="root-node"/>
>>>      <action name="root-node">
>>>          <java>
>>>              <job-tracker>$jobTracker</job-****tracker>
>>>              <name-node>hdfs://$nameNode</****name-node>
>>> <main-class>org.apache.hadoop.****examples.WordCount</main-****class>
>>>
>>>              <arg>/tmp/test/input</arg>
>>>              <arg>/tmp/test/output</arg>
>>>          </java>
>>>          <ok to="end"/>
>>>          <error to="fail"/>
>>>      </action>
>>>      <kill name="fail">
>>>          <message>Java failed, error message[\${wf:errorMessage(wf:****
>>>
>>> lastErrorNode())}]</message>
>>>      </kill>
>>>      <end name="end"/>
>>> </workflow-app>
>>> """
>>>
>>> configuration = """\
>>> <configuration>
>>>      <property>
>>>          <name>user.name</name>
>>>          <value>$username</value>
>>>      </property>
>>>      <property>
>>>          <name>oozie.wf.application.****path</name>
>>>          <value>hdfs://$nameNode/tmp/****test</value>
>>>
>>>      </property>
>>> </configuration>
>>> """
>>>
>>> session = Hadoop.login( gateway, username, password )
>>>
>>> println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
>>> ).recursive().now().statusCode
>>> println "Mkdir /tmp/test " + Hdfs.mkdir( session ).dir( "/tmp/test"
>>> ).now().statusCode
>>>
>>> putWorkflow = Hdfs.put(session).text( definition ).to(
>>> "/tmp/test/workflow.xml" ).later() {
>>>    println "Put /tmp/test/workflow.xml " + it.statusCode }
>>>
>>> putData = Hdfs.put(session).file( inputFile ).to( "/tmp/test/input/FILE"
>>> ).later() {
>>>    println "Put /tmp/test/input/FILE " + it.statusCode }
>>>
>>> putJar = Hdfs.put(session).file( jarFile ).to( "/tmp/test/lib/hadoop-***
>>> *examples.jar"
>>> ).later() {
>>>    println "Put /tmp/test/lib/hadoop-examples.****jar " + it.statusCode
>>> }
>>>
>>>
>>> session.waitFor( putWorkflow, putData, putJar )
>>>
>>> jobId = Workflow.submit(session).text( configuration ).now().jobId
>>> println "Submitted job " + jobId
>>>
>>> println "Polling for completion..."
>>> status = "UNKNOWN";
>>> count = 0;
>>> while( status != "SUCCEEDED" && count++ < 60 ) {
>>>    sleep( 1000 )
>>>    json = Workflow.status(session).****jobId( jobId ).now().string
>>>
>>>    status = JsonPath.read( json, "\$.status" )
>>> }
>>> println "Job status " + status;
>>>
>>> println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
>>> ).recursive().now().statusCode
>>>
>>> println "Shutdown " + session.shutdown( 10, SECONDS )
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is
>>> confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>>> immediately
>>> and delete it from your system. Thank You.
>>>
>>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to