There are really two issues.
1) Gateway knowing the RPC endpoints for Hadoop services (masters at least)
2) Scanning every single XML document stored via WebHDFS for
For #1 something like zookeeper might be an option but we could
certainly simulate this for master services at least by including the
information in the topology file.
For #2 I don't know what the right answer is. It would be better if
there was an Oozie API that stored the <workflow-app> definition. That
way we could limit the scanning to a single URL instead of everything
that goes in via WebHDFS.
On 9/5/13 8:54 AM, larry mccay wrote:
I wonder whether we can use the service registry aspect of zookeeper to
help in this at all.
On Thu, Sep 5, 2013 at 8:43 AM, Kevin Minder
<[email protected]>wrote:
This has always been the case. It just surfaced again for me during
testing because Sandbox changed the port on which either job-tracker or
name-node are on (can't remember which) and required that the host name be
fully qualified. No idea why localhost stopped working like it was before.
On 9/5/13 7:59 AM, larry mccay wrote:
I can't think of any way to justify doing it in Knox.
The added complexity and just plain strangeness of requiring users to knox
and configure make it nonstarter in my mind.
I was even thinking that maybe we could query the cluster for it somehow
but that would still violate our no internals leaking rule.
Oozie has to be changed.
Did something change recently to introduce this?
On Thu, Sep 5, 2013 at 12:21 AM, Kevin Minder
<[email protected]>**wrote:
Hi Everyone,
There is an unresolved issue with Knox Gateway fronting Oozie. I just
wanted to raise it again for everyone new. Take a look at the Knox DSL
sample for submitting a workflow gateway-release/home/samples/***
*ExampleSubmitWorkflow.groovy.
I included it below for convenience. Note the jobTracker and nameNode
variables defined. They are used to populate <workflow-app> and
<configuration> templates that are eventually written to HDFS as files.
Currently we do not rewrite these values so that client needs to know the
internal structure of the cluster to submit an Oozie workflow via Knox
Gateway. This goes against one of our fundamental selling points for
Knox.
There are two reasons for this:
1. We don't really want to be parsing every XML files that goes into
HDFS to look for and change things. The gateway does support this
if it weren't for #2.
2. Currently Knox Gateway doesn't know anything about the RPC ports for
Hadoop services which is what these values specify.
The question is should we ask Ozzie to do something about this or add
more
complexity to Knox Gateway to solve it. My personal vote is to have
Oozie
have defaults for the host:port for job-tracker and name-node in their
config and use relative values for oozie.wf.application.path like is done
with the <arg/>s.
Also note that the jira that tracks this issue is
KNOX-50: Ensure that all cluster topology details are rewritten for Oozie
REST APIs
What does everyone think?
Kevin.
import com.jayway.jsonpath.JsonPath
import org.apache.hadoop.gateway.****shell.Hadoop
import org.apache.hadoop.gateway.****shell.hdfs.Hdfs
import org.apache.hadoop.gateway.****shell.workflow.Workflow
import static java.util.concurrent.TimeUnit.****SECONDS
gateway =
"https://localhost:8443/****gateway/sample<https://localhost:8443/**gateway/sample>
<https://**localhost:8443/gateway/sample<https://localhost:8443/gateway/sample>
"
jobTracker = "sandbox.hortonworks.com:8050"
nameNode = "sandbox.hortonworks.com:8020"
username = "hue"
password = "hue-password"
inputFile = "LICENSE"
jarFile = "samples/hadoop-examples.jar"
definition = """\
<workflow-app xmlns="uri:oozie:workflow:0.2" name="wordcount-workflow">
<start to="root-node"/>
<action name="root-node">
<java>
<job-tracker>$jobTracker</job-****tracker>
<name-node>hdfs://$nameNode</****name-node>
<main-class>org.apache.hadoop.****examples.WordCount</main-****class>
<arg>/tmp/test/input</arg>
<arg>/tmp/test/output</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[\${wf:errorMessage(wf:****
lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
"""
configuration = """\
<configuration>
<property>
<name>user.name</name>
<value>$username</value>
</property>
<property>
<name>oozie.wf.application.****path</name>
<value>hdfs://$nameNode/tmp/****test</value>
</property>
</configuration>
"""
session = Hadoop.login( gateway, username, password )
println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
).recursive().now().statusCode
println "Mkdir /tmp/test " + Hdfs.mkdir( session ).dir( "/tmp/test"
).now().statusCode
putWorkflow = Hdfs.put(session).text( definition ).to(
"/tmp/test/workflow.xml" ).later() {
println "Put /tmp/test/workflow.xml " + it.statusCode }
putData = Hdfs.put(session).file( inputFile ).to( "/tmp/test/input/FILE"
).later() {
println "Put /tmp/test/input/FILE " + it.statusCode }
putJar = Hdfs.put(session).file( jarFile ).to( "/tmp/test/lib/hadoop-***
*examples.jar"
).later() {
println "Put /tmp/test/lib/hadoop-examples.****jar " + it.statusCode
}
session.waitFor( putWorkflow, putData, putJar )
jobId = Workflow.submit(session).text( configuration ).now().jobId
println "Submitted job " + jobId
println "Polling for completion..."
status = "UNKNOWN";
count = 0;
while( status != "SUCCEEDED" && count++ < 60 ) {
sleep( 1000 )
json = Workflow.status(session).****jobId( jobId ).now().string
status = JsonPath.read( json, "\$.status" )
}
println "Job status " + status;
println "Delete /tmp/test " + Hdfs.rm( session ).file( "/tmp/test"
).recursive().now().statusCode
println "Shutdown " + session.shutdown( 10, SECONDS )
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is
confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified
that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
immediately
and delete it from your system. Thank You.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.