[
https://issues.apache.org/jira/browse/HADOOP-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675861#action_12675861
]
Alejandro Abdelnur commented on HADOOP-5303:
--------------------------------------------
Cascading and HWS are different beasts.
Cascading is a different way of doing what Pig does. Programming in Cascading
is programming on a higher level abstraction that resolves in a series of
Map/Reduce jobs.
HWS is a (server) workflow system specialized on running Hadoop/Pig jobs wired
via a PDL descriptor.
Following a few quick highlights on how Cascading and HWS differ:
h4. Cascading uses a topological search model to resolve the execution path.
HWS uses a 'DAG of processes workflow' model that allows explicitly expressing
parallelism and alternate execution paths (decisions).
h4. Cascading runs as a client from the command line
HWS is a server system (like Hadoop Job Tracker) to which you submit workflow
jobs and later check the status.
In HWS there are not resources held once the client submitted the workflow job,
the workflow job runs in the server.
This allows you to run several thousands of workflow jobs concurrently from a
single HWS that supports system failover.
In HWS monitoring and status tracking of jobs is done via CLIs and a web
console that gathers data from HWS (like you do in Hadoop).
h4. Cascading primary programming model is similar to PIG but with a Java API.
In Cascading you can still use your Hadoop jobs as a flow, as a way to
integrate with existing map/reduce apps, but the real benefit of cascading is
by using its API programming model.
HWS primary programming model are Hadoop/Pig jobs connected via a workflow
definition PDL like XML file.
h4. In cascading you need to write Java code to wire your Hadoop jobs
In HWS you don't have to wire your Hadoop/Pig jobs in Java but in a workflow
XML file in a more declarative way.
> Hadoop Workflow System (HWS)
> ----------------------------
>
> Key: HADOOP-5303
> URL: https://issues.apache.org/jira/browse/HADOOP-5303
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Attachments: hws-preso-v1_0_2009FEB22.pdf, hws-v1_0_2009FEB22.pdf
>
>
> This is a proposal for a system specialized in running Hadoop/Pig jobs in a
> control dependency DAG (Direct Acyclic Graph), a Hadoop workflow application.
> Attached there is a complete specification and a high level overview
> presentation.
> ----
> *Highlights*
> A Workflow application is DAG that coordinates the following types of
> actions: Hadoop, Pig, Ssh, Http, Email and sub-workflows.
> Flow control operations within the workflow applications can be done using
> decision, fork and join nodes. Cycles in workflows are not supported.
> Actions and decisions can be parameterized with job properties, actions
> output (i.e. Hadoop counters, Ssh key/value pairs output) and file
> information (file exists, file size, etc). Formal parameters are expressed in
> the workflow definition as {{${VAR}}} variables.
> A Workflow application is a ZIP file that contains the workflow definition
> (an XML file), all the necessary files to run all the actions: JAR files for
> Map/Reduce jobs, shells for streaming Map/Reduce jobs, native libraries, Pig
> scripts, and other resource files.
> Before running a workflow job, the corresponding workflow application must be
> deployed in HWS.
> Deploying workflow application and running workflow jobs can be done via
> command line tools, a WS API and a Java API.
> Monitoring the system and workflow jobs can be done via a web console,
> command line tools, a WS API and a Java API.
> When submitting a workflow job, a set of properties resolving all the formal
> parameters in the workflow definitions must be provided. This set of
> properties is a Hadoop configuration.
> Possible states for a workflow jobs are: {{CREATED}}, {{RUNNING}},
> {{SUSPENDED}}, {{SUCCEEDED}}, {{KILLED}} and {{FAILED}}.
> In the case of a action failure in a workflow job, depending on the type of
> failure, HWS will attempt automatic retries, it will request a manual retry
> or it will fail the workflow job.
> HWS can make HTTP callback notifications on action start/end/failure events
> and workflow end/failure events.
> In the case of workflow job failure, the workflow job can be resubmitted
> skipping previously completed actions. Before doing a resubmission the
> workflow application could be updated with a patch to fix a problem in the
> workflow application code.
> ----
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.