Kevin,

Confirming -  jobTracker, nameNode and  oozie.wf.application.path are
picked up from the workflow-configuration.xml file presented to Oozie via
Knox while job is submitted through Knox.

Thanks
Dilli


On Thu, Sep 5, 2013 at 7:57 AM, Dilli Arumugam <[email protected]>wrote:

> Yes, Kevin, I have tried these and they did work, speaking from memory.
> Let me verify one more time and confirm.
> Thanks
> Dilli
>
>
> On Thu, Sep 5, 2013 at 7:42 AM, Kevin Minder <[email protected]
> > wrote:
>
>> Dilli,
>> I'm not sure that the job-tracker and name-node can be set via job
>> <configuration> properties vs <worfklow-app> elements.  This is something
>> that we should verify although I'm pretty sure I tried that already.  If
>> these values can be specified as config properties then I agree we should
>> rewrite the <configuration> payload as this is sent to an Oozie specific
>> URL.  Have you tried this?
>> Kevin.
>>
>>
>> On 9/5/13 10:36 AM, Dilli Arumugam wrote:
>>
>>> Hi Kevin, Larry,
>>>
>>> I see 3 properties in webflow-configuration.xml that we submit via curl
>>> that refer to rpc endpoints.
>>>
>>>    <property>
>>>          <name>jobTracker</name>
>>>          
>>> <value>dev01.hortonworks.com:**8050<http://dev01.hortonworks.com:8050>
>>> </value>
>>>          <!-- Example: <value>sandbox:50300</value> -->
>>>    <!-- Default port in hdp 2.0 is u -->
>>>          <!-- Example: <value>sandbox:8032</value> -->
>>>      </property>
>>>      <property>
>>>          <name>nameNode</name>
>>>          
>>> <value>hdfs://dev01.**hortonworks.com:8020<http://dev01.hortonworks.com:8020>
>>> </value>
>>>          <!-- Example: <value>hdfs://sandbox:8020</**value> -->
>>>      </property>
>>>      <property>
>>>          <name>oozie.wf.application.**path</name>
>>>          
>>> <value>hdfs://dev01.**hortonworks.com:8020/user/bob/**tmp/test<http://dev01.hortonworks.com:8020/user/bob/tmp/test>
>>> </value>
>>>          <!-- Example: <value>hdfs://sandbox:8020/**tmp/test</value> -->
>>>   </property>
>>>
>>> My thought on addressing this:
>>>
>>> workflow-configuration file  passes through Knox when oozie job is
>>> sumitted.
>>>
>>> When a oozie job is submitted through knox, we should replace the value
>>> of
>>> jobTracker url and namenode url with the right values of the cluster. We
>>> know the cluster that  the oozie request is targeted for.
>>>
>>> As far, oozie.wf.application.path, we should take the pathinfo after tha
>>> port, that in /tmp/test in the example above, append it to the cluster
>>> namenode port and pass it down to knox.
>>>
>>> This means Knox should store or be able to discover the namenode url and
>>> jobtraker url of the cluster. I do not see a big problem in storing it in
>>> cluster topology file on the Knox side.
>>>
>>> We should not attempt to scan files submitted to hdfs via Knox.
>>>
>>> But, Oozie use case is different.
>>> At the time, the file passes through knox,  Knox knows that the file is a
>>> config file for oozie. This contextual information would allow Knox to do
>>> some processing of the file.
>>>
>>> Thanks
>>> Dilli
>>>
>>>
>>> On Thu, Sep 5, 2013 at 7:04 AM, Kevin Minder
>>> <[email protected]>**wrote:
>>>
>>>  There are really two issues.
>>>>
>>>> 1) Gateway knowing the RPC endpoints for Hadoop services (masters at
>>>> least)
>>>> 2) Scanning every single XML document stored via WebHDFS for
>>>>
>>>> For #1 something like zookeeper might be an option but we could
>>>> certainly
>>>> simulate this for master services at least by including the information
>>>> in
>>>> the topology file.
>>>>
>>>> For #2 I don't know what the right answer is.  It would be better if
>>>> there
>>>> was an Oozie API that stored the <workflow-app> definition.  That way we
>>>> could limit the scanning to a single URL instead of everything that
>>>> goes in
>>>> via WebHDFS.
>>>>
>>>>
>>>> On 9/5/13 8:54 AM, larry mccay wrote:
>>>>
>>>>  I wonder whether we can use the service registry aspect of zookeeper to
>>>>> help in this at all.
>>>>>
>>>>>
>>>>> On Thu, Sep 5, 2013 at 8:43 AM, Kevin Minder
>>>>> <[email protected]>****wrote:
>>>>>
>>>>>
>>>>>   This has always been the case.  It just surfaced again for me during
>>>>>
>>>>>> testing because Sandbox changed the port on which either job-tracker
>>>>>> or
>>>>>> name-node are on (can't remember which) and required that the host
>>>>>> name
>>>>>> be
>>>>>> fully qualified.  No idea why localhost stopped working like it was
>>>>>> before.
>>>>>>
>>>>>>
>>>>>> On 9/5/13 7:59 AM, larry mccay wrote:
>>>>>>
>>>>>>   I can't think of any way to justify doing it in Knox.
>>>>>>
>>>>>>> The added complexity and just plain strangeness of requiring users to
>>>>>>> knox
>>>>>>> and configure make it nonstarter in my mind.
>>>>>>> I was even thinking that maybe we could query the cluster for it
>>>>>>> somehow
>>>>>>> but that would still violate our no internals leaking rule.
>>>>>>>
>>>>>>> Oozie has to be changed.
>>>>>>> Did something change recently to introduce this?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 5, 2013 at 12:21 AM, Kevin Minder
>>>>>>> <[email protected]>******wrote:
>>>>>>>
>>>>>>>    Hi Everyone,
>>>>>>>
>>>>>>>  There is an unresolved issue with Knox Gateway fronting Oozie.  I
>>>>>>>> just
>>>>>>>> wanted to raise it again for everyone new.  Take a look at the Knox
>>>>>>>> DSL
>>>>>>>> sample for submitting a workflow gateway-release/home/samples/***
>>>>>>>> ****
>>>>>>>>
>>>>>>>> *ExampleSubmitWorkflow.groovy.
>>>>>>>>
>>>>>>>>     I included it below for convenience.  Note the jobTracker and
>>>>>>>> nameNode
>>>>>>>> variables defined.  They are used to populate <workflow-app> and
>>>>>>>> <configuration> templates that are eventually written to HDFS as
>>>>>>>> files.
>>>>>>>>
>>>>>>>> Currently we do not rewrite these values so that client needs to
>>>>>>>> know
>>>>>>>> the
>>>>>>>> internal structure of the cluster to submit an Oozie workflow via
>>>>>>>> Knox
>>>>>>>> Gateway.  This goes against one of our fundamental selling points
>>>>>>>> for
>>>>>>>> Knox.
>>>>>>>>
>>>>>>>> There are two reasons for this:
>>>>>>>>
>>>>>>>> 1. We don't really want to be parsing every XML files that goes into
>>>>>>>>       HDFS to look for and change things.  The gateway does support
>>>>>>>> this
>>>>>>>>       if it weren't for #2.
>>>>>>>> 2. Currently Knox Gateway doesn't know anything about the RPC ports
>>>>>>>> for
>>>>>>>>       Hadoop services which is what these values specify.
>>>>>>>>
>>>>>>>> The question is should we ask Ozzie to do something about this or
>>>>>>>> add
>>>>>>>> more
>>>>>>>> complexity to Knox Gateway to solve it.  My personal vote is to have
>>>>>>>> Oozie
>>>>>>>> have defaults for the host:port for job-tracker and name-node in
>>>>>>>> their
>>>>>>>> config and use relative values for oozie.wf.application.path like is
>>>>>>>> done
>>>>>>>> with the <arg/>s.
>>>>>>>>
>>>>>>>> Also note that the jira that tracks this issue is
>>>>>>>> KNOX-50: Ensure that all cluster topology details are rewritten for
>>>>>>>> Oozie
>>>>>>>> REST APIs
>>>>>>>>
>>>>>>>> What does everyone think?
>>>>>>>>
>>>>>>>> Kevin.
>>>>>>>>
>>>>>>>>
>>>>>>>> import com.jayway.jsonpath.JsonPath
>>>>>>>> import org.apache.hadoop.gateway.********shell.Hadoop
>>>>>>>> import org.apache.hadoop.gateway.********shell.hdfs.Hdfs
>>>>>>>> import org.apache.hadoop.gateway.********shell.workflow.Workflow
>>>>>>>>
>>>>>>>> import static java.util.concurrent.TimeUnit.********SECONDS
>>>>>>>>
>>>>>>>> gateway = 
>>>>>>>> "https://localhost:8443/********gateway/sample<https://localhost:8443/******gateway/sample>
>>>>>>>> <https://**localhost:8443/****gateway/**sample<https://localhost:8443/****gateway/sample>
>>>>>>>> >
>>>>>>>> <https://**localhost:8443/****gateway/**sample<https://**
>>>>>>>> localhost:8443/**gateway/**sample<https://localhost:8443/**gateway/sample>
>>>>>>>> >
>>>>>>>> <https://**localhost:8443/****gateway/sample<https://**
>>>>>>>>
>>>>>>>> localhost:8443/gateway/sample <https://localhost:8443/**
>>>>>>>> gateway/sample <https://localhost:8443/gateway/sample>>>
>>>>>>>> "
>>>>>>>> jobTracker = "sandbox.hortonworks.com:8050"
>>>>>>>> nameNode = "sandbox.hortonworks.com:8020"
>>>>>>>> username = "hue"
>>>>>>>> password = "hue-password"
>>>>>>>> inputFile = "LICENSE"
>>>>>>>> jarFile = "samples/hadoop-examples.jar"
>>>>>>>>
>>>>>>>> definition = """\
>>>>>>>> <workflow-app xmlns="uri:oozie:workflow:0.2"
>>>>>>>> name="wordcount-workflow">
>>>>>>>>        <start to="root-node"/>
>>>>>>>>        <action name="root-node">
>>>>>>>>            <java>
>>>>>>>>                <job-tracker>$jobTracker</job-********tracker>
>>>>>>>>                <name-node>hdfs://$nameNode</********name-node>
>>>>>>>> <main-class>org.apache.hadoop.********examples.WordCount</**
>>>>>>>> main-***
>>>>>>>>
>>>>>>>> ***class>
>>>>>>>>
>>>>>>>>                <arg>/tmp/test/input</arg>
>>>>>>>>                <arg>/tmp/test/output</arg>
>>>>>>>>            </java>
>>>>>>>>            <ok to="end"/>
>>>>>>>>            <error to="fail"/>
>>>>>>>>        </action>
>>>>>>>>        <kill name="fail">
>>>>>>>>            <message>Java failed, error
>>>>>>>> message[\${wf:errorMessage(wf:****
>>>>>>>> ****
>>>>>>>>
>>>>>>>> lastErrorNode())}]</message>
>>>>>>>>        </kill>
>>>>>>>>        <end name="end"/>
>>>>>>>> </workflow-app>
>>>>>>>> """
>>>>>>>>
>>>>>>>> configuration = """\
>>>>>>>> <configuration>
>>>>>>>>        <property>
>>>>>>>>            <name>user.name</name>
>>>>>>>>            <value>$username</value>
>>>>>>>>        </property>
>>>>>>>>        <property>
>>>>>>>>            <name>oozie.wf.application.********path</name>
>>>>>>>>            <value>hdfs://$nameNode/tmp/********test</value>
>>>>>>>>
>>>>>>>>
>>>>>>>>        </property>
>>>>>>>> </configuration>
>>>>>>>> """
>>>>>>>>
>>>>>>>> session = Hadoop.login( gateway, username, password )
>>>>>>>>
>>>>>>>> println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
>>>>>>>> ).recursive().now().statusCode
>>>>>>>> println "Mkdir /tmp/test " + Hdfs.mkdir( session ).dir( "/tmp/test"
>>>>>>>> ).now().statusCode
>>>>>>>>
>>>>>>>> putWorkflow = Hdfs.put(session).text( definition ).to(
>>>>>>>> "/tmp/test/workflow.xml" ).later() {
>>>>>>>>      println "Put /tmp/test/workflow.xml " + it.statusCode }
>>>>>>>>
>>>>>>>> putData = Hdfs.put(session).file( inputFile ).to(
>>>>>>>> "/tmp/test/input/FILE"
>>>>>>>> ).later() {
>>>>>>>>      println "Put /tmp/test/input/FILE " + it.statusCode }
>>>>>>>>
>>>>>>>> putJar = Hdfs.put(session).file( jarFile ).to(
>>>>>>>> "/tmp/test/lib/hadoop-***
>>>>>>>> *examples.jar"
>>>>>>>> ).later() {
>>>>>>>>      println "Put /tmp/test/lib/hadoop-examples.********jar " +
>>>>>>>>
>>>>>>>> it.statusCode
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> session.waitFor( putWorkflow, putData, putJar )
>>>>>>>>
>>>>>>>> jobId = Workflow.submit(session).text( configuration ).now().jobId
>>>>>>>> println "Submitted job " + jobId
>>>>>>>>
>>>>>>>> println "Polling for completion..."
>>>>>>>> status = "UNKNOWN";
>>>>>>>> count = 0;
>>>>>>>> while( status != "SUCCEEDED" && count++ < 60 ) {
>>>>>>>>      sleep( 1000 )
>>>>>>>>      json = Workflow.status(session).********jobId( jobId
>>>>>>>> ).now().string
>>>>>>>>
>>>>>>>>
>>>>>>>>      status = JsonPath.read( json, "\$.status" )
>>>>>>>> }
>>>>>>>> println "Job status " + status;
>>>>>>>>
>>>>>>>> println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
>>>>>>>> ).recursive().now().statusCode
>>>>>>>>
>>>>>>>> println "Shutdown " + session.shutdown( 10, SECONDS )
>>>>>>>>
>>>>>>>> --
>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>>>> entity
>>>>>>>> to which it is addressed and may contain information that is
>>>>>>>> confidential,
>>>>>>>> privileged and exempt from disclosure under applicable law. If the
>>>>>>>> reader
>>>>>>>> of this message is not the intended recipient, you are hereby
>>>>>>>> notified
>>>>>>>> that
>>>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>>>> received this communication in error, please contact the sender
>>>>>>>> immediately
>>>>>>>> and delete it from your system. Thank You.
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>>>
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>>>> entity
>>>>>> to which it is addressed and may contain information that is
>>>>>> confidential,
>>>>>> privileged and exempt from disclosure under applicable law. If the
>>>>>> reader
>>>>>> of this message is not the intended recipient, you are hereby notified
>>>>>> that
>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>> received this communication in error, please contact the sender
>>>>>> immediately
>>>>>> and delete it from your system. Thank You.
>>>>>>
>>>>>>
>>>>>>  --
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or entity
>>>> to which it is addressed and may contain information that is
>>>> confidential,
>>>> privileged and exempt from disclosure under applicable law. If the
>>>> reader
>>>> of this message is not the intended recipient, you are hereby notified
>>>> that
>>>> any printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of this communication is strictly prohibited. If you have
>>>> received this communication in error, please contact the sender
>>>> immediately
>>>> and delete it from your system. Thank You.
>>>>
>>>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to