What do you think about "implicit" workflow for DataCapture in the roadmap ?

Let me take an example:

1/ I submit a cluster:
bin/falcon entity -submit -type cluster -file local.xml
where local.xml contains:

<cluster colo="local" description="Local cluster" name="local" xmlns="uri:falcon:cluster:0.1">
    <interfaces>
<interface type="readonly" endpoint="hftp://localhost:50010"; version="1.1.2"/> <interface type="write" endpoint="hdfs://localhost:8020" version="1.1.2"/> <interface type="execute" endpoint="localhost:8021" version="1.1.2"/> <interface type="workflow" endpoint="http://localhost:11000/oozie/"; version="4.0.0"/> <interface type="messaging" endpoint="tcp://localhost:61616" version="5.7.0"/>
    </interfaces>
    <locations>
        <location name="staging" path="/falcon/staging"/>
        <location name="temp" path="/falcon/tmp"/>
        <location name="working" path="/falcon/working"/>
    </locations>
    <properties>
    </properties>
</cluster>

2/ I submit a feed:
bin/falcon entity -submit -type feed -file feed.xml
where feed.xml contains:

<feed description="" name="output" xmlns="uri:falcon:feed:0.1">
    <groups>output</groups>

    <frequency>minutes(1)</frequency>
    <timezone>UTC</timezone>

    <clusters>
        <cluster name="local">
            <validity start="2013-01-01T00:00Z" end="2030-01-01T00:00Z"/>
            <retention limit="minutes(2)" action="delete"/>
            <capture interval="minutes(5)"/>
        </cluster>
    </clusters>

    <locations>
        <location type="data" path="/data/output"/>
    </locations>

    <ACL owner="jbonofre" group="group" permission="0x644"/>
    <schema location="/falcon/schema/none" provider="none"/>
</feed>

Note the <capture/> element. With the capture element, the feed is not really generated at the frequency interval, but checked at the capture interval. By checked, it compares the previous state (using the retention) and send a message to the JMS broker. The message contains the "delta" (so basically the change), between the previous check and the latest one (if there is). It means that we have to create a coord job in oozie that execute at the capture interval and executing a "special" job to manage delta/diff.

WDYT ?

Regards
JB

On 01/09/2014 01:10 PM, Sharad Agarwal wrote:
Looks Great.
I would also like to propose to add 4) Support stream processing over Tez
Once we have the streaming abstractions, it should be easy to build for Tez
too. This would allow to run faster pipelines in Hadoop itself.


On Thu, Jan 9, 2014 at 4:01 PM, Srikanth Sundarrajan <[email protected]>wrote:

Hi Jean,If there is enough interest in this area and is in line with the
larger objectives of the project, we should certainly be able to add it to
the Roadmap. I am keen to learn more about this and your thinking on the
topic.
RegardsSrikanth Sundarrajan

Date: Thu, 9 Jan 2014 11:20:54 +0100
From: [email protected]
To: [email protected]
Subject: Re: [DISCUSS] Falcon Roadmap

Hi Srikanth,

The roadmap looks good to me.

Do we have any plan about data anonymity ?

On my side, I'm working on an example of CDC with Falcon and Camel, and
support of Falcon commands directly in Karaf. (update to ActiveMQ 5.9.0
is in progress too).

Regards
JB

On 01/09/2014 11:16 AM, Srikanth Sundarrajan wrote:



Hi Everyone,    We have made good progress on the Falcon project since
its incubation. We have had an initial release (0.3-incubating), following
which we have added support for Hadoop 2.0, Hcatalog integration, hive
execution engine. Venkatesh Seetharam has recently called for vote for
release of 0.4-incubating with these features. We are now actively adding
security features besides a number of operability improvements, all of
which should go out in 0.5-incubating release in near future. At this
juncture, I wanted to get the thoughts from the community on the following
feature addition to Falcon over the next few releases.
1. Falcon Administration and Monitoring Dashboard (Umbrella JIRA:
FALCON-37)2. Support for Falcon Process Designer (Umberlla JIRA:
FALCON-253)3. Support stream abstractions and allow for streaming
processing through Falcon over Apache Storm
RegardsSrikanth Sundarrajan



--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to