Repository: falcon Updated Branches: refs/heads/master bd147972b -> 093ad2797
FALCON-1770 Update README file. 1. Renamed file to README.md. 2. Revisited the contents for some sections and reformatted some others. Author: Ajay Yadava <[email protected]> Reviewers: Srikanth Sundarrajan <[email protected]>, Sandeep Samudrala <[email protected]> Closes #13 from ajayyadava/master Project: http://git-wip-us.apache.org/repos/asf/falcon/repo Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/093ad279 Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/093ad279 Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/093ad279 Branch: refs/heads/master Commit: 093ad2797bcd3fb1caec53ab4696f30ca0db1657 Parents: bd14797 Author: Ajay Yadav <[email protected]> Authored: Mon Jan 25 17:29:54 2016 +0530 Committer: Ajay Yadava <[email protected]> Committed: Mon Jan 25 17:34:02 2016 +0530 ---------------------------------------------------------------------- README | 147 --------------------------------------------------------- README.md | 47 ++++++++++++++++++ 2 files changed, 47 insertions(+), 147 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/falcon/blob/093ad279/README ---------------------------------------------------------------------- diff --git a/README b/README deleted file mode 100644 index 94a1335..0000000 --- a/README +++ /dev/null @@ -1,147 +0,0 @@ -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -Falcon Overview - -Falcon is a feed processing and feed management system aimed at making it -easier for end consumers to onboard their feed processing and feed -management on hadoop clusters. - -Why? - -* Dependencies across various data processing pipelines are not easy to - establish. Gaps here typically leads to either incorrect/partial - processing or expensive reprocessing. Repeated duplicate definition of - a single feed multiple times can lead to inconsistencies / issues. - -* Input data may not arrive always on time and it is required to kick off - the processing without waiting for all data to arrive and accommodate - late data separately - -* Feed management services such as feed retention, replications across - clusters, archival etc are tasks that are burdensome on individual - pipeline owners and better offered as a service for all customers. - -* It should be easy to onboard new workflows/pipelines - -* Smoother integration with metastore/catalog - -* Provide notification to end customer based on availability of feed - groups (logical group of related feeds, which are likely to be used - together) - -Usage - -a. Setup cluster definition - $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml -url http://falcon-server:falcon-port - -b. Setup feed definition - $FALCON_HOME/bin/falcon entity -submit -type feed -file /feed1/definition.xml -url http://falcon-server:falcon-port - $FALCON_HOME/bin/falcon entity -submit -type feed -file /feed2/definition.xml -url http://falcon-server:falcon-port - -c. Setup process definition - $FALCON_HOME/bin/falcon entity -submit -type process -file /process/definition.xml -url http://falcon-server:falcon-port - -d. Once submitted, entity definition, status and dependency can be queried. - $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> [-definition|-status|-dependency] -url http://falcon-server:falcon-port - - or entities for a particular type can be listed through - $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list - -e. Schedule process - $FALCON_HOME/bin/falcon entity -type process -name process -schedule -url http://falcon-server:falcon-port - -f. Once scheduled entities can be suspended, resumed or deleted (post submit) - $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> [-suspend|-delete|-resume] -url http://falcon-server:falcon-port - -g. Once scheduled process instances can be managed through irovy CLI - $FALCON_HOME/bin/falcon instance -processName <<name>> [-kill|-suspend|-resume|-re-run] -start "yyyy-MM-dd'T'HH:mm'Z'" -url http://falcon-server:falcon-port - -Example configurations - -Cluster: -<?xml version="1.0" encoding="UTF-8"?> -<cluster colo="local" description="" name="local" xmlns="uri:falcon:cluster:0.1"> - <interfaces> - <interface type="readonly" endpoint="hftp://localhost:41110" - version="0.20.2"/> - <interface type="write" endpoint="hdfs://localhost:41020" - version="0.20.2"/> - <interface type="execute" endpoint="localhost:41021" version="0.20.2"/> - <interface type="workflow" endpoint="http://localhost:41000/oozie/" - version="4.0"/> - <interface type="messaging" endpoint="tcp://localhost:61616?daemon=true" - version="5.1.6"/> - <interface type="registry" endpoint="Hcat" version="1"/> - </interfaces> - <locations> - <location name="staging" path="/projects/falcon/staging"/> - <location name="temp" path="/tmp"/> - <location name="working" path="/project/falcon/working"/> - </locations> -</cluster> - -Feed: -<?xml version="1.0" encoding="UTF-8"?> -<feed description="in" name="in" xmlns="uri:ivory:feed:0.1"> - <partitions> - <partition name="type"/> - </partitions> - <groups>in</groups> - - <frequency>hours(1)</frequency> - <timezone>UTC</timezone> - <late-arrival cut-off="hours(6)"/> - - <clusters> - <cluster name="local"> - <validity start="2013-01-01T00:00Z" end="2020-01-01T12:00Z"/> - <retention limit="hours(24)" action="delete"/> - </cluster> - </clusters> - - <locations> - <location type="data" path="/data/in/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> - </locations> - - <ACL owner="testuser" group="group" permission="0x644"/> - <schema location="/schema/in/in.format.csv" provider="csv"/> -</feed> - -Process: -<?xml version="1.0" encoding="UTF-8"?> -<process name="wf-process" xmlns="uri:falcon:process:0.1"> - <clusters> - <cluster name="local"> - <validity start="2013-01-01T00:00Z" end="2013-01-01T02:00Z"/> - </cluster> - </clusters> - - <parallel>1</parallel> - <order>LIFO</order> - <frequency>hours(1)</frequency> - <timezone>UTC</timezone> - - <inputs> - <input name="input" feed="in" start="now(0,0)" end="now(0,0)"/> - </inputs> - - <outputs> - <output name="output" feed="out" instance="today(0,0)"/> - </outputs> - - <workflow engine="oozie" path="/app/mapred-wf.xml"/> -</process> http://git-wip-us.apache.org/repos/asf/falcon/blob/093ad279/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md new file mode 100644 index 0000000..7c754e7 --- /dev/null +++ b/README.md @@ -0,0 +1,47 @@ +# Apache Falcon + +Falcon is a feed processing and feed management system aimed at making it +easier for end consumers to onboard their feed processing and feed +management on hadoop clusters. + +## Why Apache Falcon? + +* Dependencies across various data processing pipelines are not easy to + establish. Gaps here typically leads to either incorrect/partial + processing or expensive reprocessing. Repeated duplicate definition of + a single feed multiple times can lead to inconsistencies / issues. + +* Input data may not arrive always on time and it is required to kick off + the processing without waiting for all data to arrive and accommodate + late data separately + +* Feed management services such as feed retention, replications across + clusters, archival etc are tasks that are burdensome on individual + pipeline owners and better offered as a service for all customers. + +* It should be easy to onboard new workflows/pipelines + +* Smoother integration with metastore/catalog + +* Provide notification to end customer based on availability of feed + groups (logical group of related feeds, which are likely to be used + together) + +## Online Documentation + +You can find the documentation on [Apache Falcon website](http://falcon.apache.org/). + +## How to Contribute + +Before opening a pull request, please go through the [Contributing to Apache Falcon wiki](https://cwiki.apache.org/confluence/display/FALCON/How+To+Contribute). It lists steps that are required before creating a PR and the conventions that we follow. If you are looking for issues to pick up then you can look at [starter tasks](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FALCON%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20newbie%20AND%20assignee%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC) or [open tasks](https://issues.apache.org/jira/issues/?jql=project%20%3D%20FALCON%20AND%20resolution%20%3D%20Unresolved%20AND%20assignee%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC) + + +## Release Notes +You can download release notes of previous releases from the following links. + +[0.8](https://cwiki.apache.org/confluence/download/attachments/61318307/RelaseNotes-ApacheFalcon-0.7.pdf?api=v2) + +[0.7](https://cwiki.apache.org/confluence/download/attachments/61318307/RelaseNotes-ApacheFalcon-0.7.pdf?api=v2) + + +
