svn commit: r1730449 [1/3] - in /falcon/trunk: ./ general/ general/src/site/ general/src/site/twiki/ general/src/site/twiki/falconcli/ general/src/site/twiki/restapi/ releases/

pallavi Sun, 14 Feb 2016 21:48:36 -0800

Author: pallavi
Date: Mon Feb 15 05:48:00 2016
New Revision: 1730449

URL: http://svn.apache.org/viewvc?rev=1730449&view=rev
Log:
Updating docs under trunk for 0.9 release


Added:
    falcon/trunk/general/src/site/twiki/Configuration.twiki
    falcon/trunk/general/src/site/twiki/Distributed-mode.twiki
    falcon/trunk/general/src/site/twiki/Embedded-mode.twiki
    falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki
    falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki
    falcon/trunk/general/src/site/twiki/HDFSDR.twiki
    falcon/trunk/general/src/site/twiki/HiveDR.twiki
    falcon/trunk/general/src/site/twiki/ImportExport.twiki
    falcon/trunk/general/src/site/twiki/falconcli/
    falcon/trunk/general/src/site/twiki/falconcli/CommonCLI.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ContinueInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Definition.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DeleteEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DependencyEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DependencyInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/EdgeMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/FalconCLI.twiki
    falcon/trunk/general/src/site/twiki/falconcli/FeedInstanceListing.twiki
    falcon/trunk/general/src/site/twiki/falconcli/HelpAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/KillInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LifeCycleInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LineageMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LogsInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Lookup.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ParamsInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RelationMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RerunInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ResumeEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ResumeInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RunningInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SLAAlert.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Schedule.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Submit.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SubmitRecipe.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SummaryEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SummaryInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SuspendEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SuspendInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Touch.twiki
    falcon/trunk/general/src/site/twiki/falconcli/TriageInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/UpdateEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VersionAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VertexEdgesMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VertexMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VerticesMetadata.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedSLA.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceDependencies.twiki
    falcon/trunk/general/src/site/twiki/restapi/Triage.twiki
Modified:
    falcon/trunk/general/pom.xml
    falcon/trunk/general/src/site/site.xml
    falcon/trunk/general/src/site/twiki/EntitySpecification.twiki
    falcon/trunk/general/src/site/twiki/FalconCLI.twiki
    falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki
    falcon/trunk/general/src/site/twiki/InstallationSteps.twiki
    falcon/trunk/general/src/site/twiki/OnBoarding.twiki
    falcon/trunk/general/src/site/twiki/Operability.twiki
    falcon/trunk/general/src/site/twiki/Recipes.twiki
    falcon/trunk/general/src/site/twiki/Security.twiki
    falcon/trunk/general/src/site/twiki/index.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdjacentVertices.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdminStack.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdminVersion.twiki
    falcon/trunk/general/src/site/twiki/restapi/AllEdges.twiki
    falcon/trunk/general/src/site/twiki/restapi/AllVertices.twiki
    falcon/trunk/general/src/site/twiki/restapi/Edge.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDefinition.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDelete.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDependencies.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityLineage.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityList.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityResume.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySchedule.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityStatus.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySubmit.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySubmitAndSchedule.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySummary.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySuspend.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityTouch.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityUpdate.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityValidate.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedInstanceListing.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedLookup.twiki
    falcon/trunk/general/src/site/twiki/restapi/Graph.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceKill.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceList.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceLogs.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceParams.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceRerun.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceResume.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceRunning.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceStatus.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceSummary.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceSuspend.twiki
    falcon/trunk/general/src/site/twiki/restapi/MetadataList.twiki
    falcon/trunk/general/src/site/twiki/restapi/MetadataRelations.twiki
    falcon/trunk/general/src/site/twiki/restapi/ResourceList.twiki
    falcon/trunk/general/src/site/twiki/restapi/Vertex.twiki
    falcon/trunk/general/src/site/twiki/restapi/VertexProperties.twiki
    falcon/trunk/general/src/site/twiki/restapi/Vertices.twiki
    falcon/trunk/pom.xml
    falcon/trunk/releases/pom.xml

Modified: falcon/trunk/general/pom.xml
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/pom.xml?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/pom.xml (original)
+++ falcon/trunk/general/pom.xml Mon Feb 15 05:48:00 2016
@@ -22,10 +22,10 @@
     <parent>
         <groupId>org.apache.falcon</groupId>
         <artifactId>falcon-website</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.9-SNAPSHOT</version>
     </parent>
     <artifactId>falcon-website-general</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.9-SNAPSHOT</version>
     <packaging>war</packaging>
 
     <name>Apache Falcon - General</name>

Modified: falcon/trunk/general/src/site/site.xml
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/site.xml?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/site.xml (original)
+++ falcon/trunk/general/src/site/site.xml Mon Feb 15 05:48:00 2016
@@ -148,6 +148,7 @@
 
         <menu name="Documentation">
             <!-- current points to latest release -->
+            <item name="0.9 (Current)" href="./0.9/index.html"/>
             <item name="0.8" href="./0.8/index.html"/>
             <item name="0.7" href="./0.7/index.html"/>
             <item name="0.6.1" href="./0.6.1/index.html"/>

Added: falcon/trunk/general/src/site/twiki/Configuration.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Configuration.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Configuration.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Configuration.twiki Mon Feb 15 05:48:00 
2016
@@ -0,0 +1,122 @@
+---+Configuring Falcon
+
+By default config directory used by falcon is {package dir}/conf. To override 
this (to use the same conf with multiple
+falcon upgrades), set environment variable FALCON_CONF to the path of the conf 
dir.
+
+falcon-env.sh has been added to the falcon conf. This file can be used to set 
various environment variables that you
+need for you services.
+In addition you can set any other environment variables you might need. This 
file will be sourced by falcon scripts
+before any commands are executed. The following environment variables are 
available to set.
+
+<verbatim>
+# The java implementation to use. If JAVA_HOME is not found we expect java and 
jar to be in path
+#export JAVA_HOME=
+
+# any additional java opts you want to set. This will apply to both client and 
server operations
+#export FALCON_OPTS=
+
+# any additional java opts that you want to set for client only
+#export FALCON_CLIENT_OPTS=
+
+# java heap size we want to set for the client. Default is 1024MB
+#export FALCON_CLIENT_HEAP=
+
+# any additional opts you want to set for prism service.
+#export FALCON_PRISM_OPTS=
+
+# java heap size we want to set for the prism service. Default is 1024MB
+#export FALCON_PRISM_HEAP=
+
+# any additional opts you want to set for falcon service.
+#export FALCON_SERVER_OPTS=
+
+# java heap size we want to set for the falcon server. Default is 1024MB
+#export FALCON_SERVER_HEAP=
+
+# What is is considered as falcon home dir. Default is the base location of 
the installed software
+#export FALCON_HOME_DIR=
+
+# Where log files are stored. Default is logs directory under the base install 
location
+#export FALCON_LOG_DIR=
+
+# Where pid files are stored. Default is logs directory under the base install 
location
+#export FALCON_PID_DIR=
+
+# where the falcon active mq data is stored. Default is logs/data directory 
under the base install location
+#export FALCON_DATA_DIR=
+
+# Where do you want to expand the war file. By Default it is in /server/webapp 
dir under the base install dir.
+#export FALCON_EXPANDED_WEBAPP_DIR=
+</verbatim>
+
+---++Advanced Configurations
+
+---+++Configuring Monitoring plugin to register catalog partitions
+Falcon comes with a monitoring plugin that registers catalog partition. This 
comes in really handy during migration from
+ filesystem based feeds to hcatalog based feeds.
+This plugin enables the user to de-couple the partition registration and 
assume that all partitions are already on
+hcatalog even before the migration, simplifying the hcatalog migration.
+
+By default this plugin is disabled.
+To enable this plugin and leverage the feature, there are 3 pre-requisites:
+<verbatim>
+In {package dir}/conf/startup.properties, add
+*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler
+
+In the cluster definition, ensure registry endpoint is defined.
+Ex:
+<interface type="registry" endpoint="thrift://localhost:1109" 
version="0.13.3"/>
+
+In the feed definition, ensure the corresponding catalog table is mentioned in 
feed-properties
+Ex:
+<properties>
+    <property name="catalog.table" 
value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR};
+    minute={MINUTE}"/>
+</properties>
+</verbatim>
+
+*NOTE : for Mac OS users*
+<verbatim>
+If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS 
(explained above).
+
+In  {package dir}/conf/falcon-env.sh uncomment the following line
+#export FALCON_SERVER_OPTS=
+
+and change it to look as below
+export FALCON_SERVER_OPTS="-Djava.awt.headless=true 
-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
+</verbatim>
+
+---+++Activemq
+
+* falcon server starts embedded active mq. To control this behaviour, set the 
following system properties using -D
+option in environment variable FALCON_OPTS:
+   * falcon.embeddedmq=<true/false> - Should server start embedded active mq, 
default true
+   * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
+   * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default 
{package dir}/logs/data
+
+---+++Falcon System Notifications
+Some Falcon features such as late data handling, retries, metadata service, 
depend on JMS notifications sent when the Oozie workflow completes. These 
system notifications are sent as part of Falcon Post Processing action. Given 
that the post processing action is also a job, it is prone to failures and in 
case of failures, Falcon is blind to the status of the workflow. To alleviate 
this problem and make the notifications more reliable, you can enable Oozie's 
JMS notification feature and disable Falcon post-processing notification by 
making the following changes:
+   * In Falcon runtime.properties, set *.falcon.jms.notification.enabled to 
false. This will turn off JMS notification in post-processing.
+   * Copy notification related properties in oozie/conf/oozie-site.xml to 
oozie-site.xml of the Oozie installation.  Restart Oozie so changes get 
reflected.  
+
+*NOTE : If you disable Falcon post-processing JMS notification and not enable 
Oozie JMS notification, features such as failure retry, late data handling and 
metadata service will be disabled for all entities on the server.*
+
+---+++Enabling Falcon Native Scheudler
+You can either choose to schedule entities using Oozie's coordinator or using 
Falcon's native scheduler. To be able to schedule entities natively on Falcon, 
you will need to add some additional properties to 
<verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting the 
Falcon Server. For details on the same, refer to 
[[FalconNativeScheduler][Falcon Native Scheduler]]
+
+---+++Adding Extension Libraries
+
+Library extensions allows users to add custom libraries to entity lifecycles 
such as feed retention, feed replication
+and process execution. This is useful for usecases such as adding filesystem 
extensions. To enable this, add the
+following configs to startup.properties:
+*.libext.paths=<paths to be added to all entity lifecycles>
+
+*.libext.feed.paths=<paths to be added to all feed lifecycles>
+
+*.libext.feed.retentions.paths=<paths to be added to feed retention workflow>
+
+*.libext.feed.replication.paths=<paths to be added to feed replication 
workflow>
+
+*.libext.process.paths=<paths to be added to process workflow>
+
+The configured jars are added to falcon classpath and the corresponding 
workflows.

Added: falcon/trunk/general/src/site/twiki/Distributed-mode.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Distributed-mode.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Distributed-mode.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Distributed-mode.twiki Mon Feb 15 
05:48:00 2016
@@ -0,0 +1,198 @@
+---+Distributed Mode
+
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. 
You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Letâs 
call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true 
-Pdistributed,hadoop-2
+</verbatim>
+
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+
+It should give an output like below :
+<verbatim>
+apache-falcon-distributed-${project.version}-server.tar.gz
+apache-falcon-distributed-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+   * apache-falcon-distributed-${project.version}-sources.tar.gz contains 
source files of Falcon repo.
+
+   * apache-falcon-distributed-${project.version}-server.tar.gz package 
contains project artifacts along with it's
+dependencies, configuration files and scripts required to deploy Falcon.
+
+
+Tar can be found in {project 
dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz . This 
is the tar
+used for installing Falcon. Lets call it {falcon package}
+
+Tar is structured as follows.
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+   |- prism-stop
+   |- prism-start
+   |- prism-status
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- client.properties
+   |- prism.keystore
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+      |- prism.war
+|- oozie
+   |- conf
+   |- libext
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+---+++Installing Falcon
+
+Running Falcon in distributed mode requires bringing up both prism and 
server.As the name suggests Falcon prism splits
+the request it gets to the Falcon servers. It is a good practice to start 
prism and server with their corresponding
+configurations separately. Create separate directory for prism and server. 
Let's call them {falcon-prism-dir} and
+{falcon-server-dir} respectively.
+
+*For prism*
+<verbatim>
+$mkdir {falcon-prism-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+*For server*
+<verbatim>
+$mkdir {falcon-server-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+
+---+++Starting Prism
+
+<verbatim>
+cd {falcon-prism-dir}/falcon-distributed-${project.version}
+bin/prism-start [-port <port>]
+</verbatim>
+
+By default,
+* prism server starts at port 16443. To change the port, use -port option
+
+* falcon.enableTLS can be set to true or false explicitly to enable SSL, if 
not port that end with 443 will
+automatically put prism on https://
+
+* prism starts with conf from 
{falcon-prism-dir}/falcon-distributed-${project.version}/conf. To override this 
(to use
+the same conf with multiple prism upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling prism-client*
+*If prism is not started using default-port 16443 then edit the following 
property in
+{falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties
+falcon.url=http://{machine-ip}:{prism-port}/
+
+
+---+++Starting Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon 
starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 
on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will 
automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* server starts with conf from 
{falcon-server-dir}/falcon-distributed-${project.version}/conf. To override 
this (to use
+the same conf with multiple server upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling server-client*
+*If server is not started using default-port 15443 then edit the following 
property in
+{falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties.
 You can find the instructions for
+configuring Falcon here.
+falcon.url=http://{machine-ip}:{server-port}/
+
+*NOTE* : https is the secure version of HTTP, the protocol over which data is 
sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can 
configure it to http.
+
+
+---+++Using Falcon
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/falcon admin -version
+Falcon server build version: 
{Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",
+Mode:"embedded"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+
+---+++Dashboard
+
+Once Falcon / prism is started, you can view the status of Falcon entities 
using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this 
user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---+++Stopping Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-stop
+</verbatim>
+
+---+++Stopping Falcon Prism
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/prism-stop
+</verbatim>

Added: falcon/trunk/general/src/site/twiki/Embedded-mode.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Embedded-mode.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Embedded-mode.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Embedded-mode.twiki Mon Feb 15 05:48:00 
2016
@@ -0,0 +1,198 @@
+---+Embedded Mode
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. 
You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Letâs 
call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true
+</verbatim>
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+It should give an output like below :
+<verbatim>
+apache-falcon-${project.version}-bin.tar.gz
+apache-falcon-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+* apache-falcon-${project.version}-sources.tar.gz contains source files of 
Falcon repo.
+
+* apache-falcon-${project.version}-bin.tar.gz package contains project 
artifacts along with it's dependencies,
+configuration files and scripts required to deploy Falcon.
+
+Tar can be found in {project 
dir}/target/apache-falcon-${project.version}-bin.tar.gz
+
+Tar is structured as follows :
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- prism.keystore
+   |- client.properties
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+|- data
+   |- falcon-store
+   |- graphdb
+   |- localhost
+|- examples
+   |- app
+      |- hive
+      |- oozie-mr
+      |- pig
+   |- data
+   |- entity
+      |- filesystem
+      |- hcat
+|- oozie
+   |- conf
+   |- libext
+|- logs
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+Running Falcon in embedded mode requires bringing up server.
+
+<verbatim>
+$tar -xzvf {falcon package}
+$cd falcon-${project.version}
+</verbatim>
+
+
+---+++Starting Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon 
starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 
on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will 
automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* Server starts with conf from 
{falcon-server-dir}/falcon-distributed-${project.version}/conf. To override 
this (to use
+the same conf with multiple server upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+
+---+++Enabling server-client
+If server is not started using default-port 15443 then edit the following 
property in
+{falcon-server-dir}/falcon-${project.version}/conf/client.properties
+
+falcon.url=http://{machine-ip}:{server-port}/
+
+
+---+++Using Falcon
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon admin -version
+Falcon server build version: 
{Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:
+"embedded",Hadoop:"${hadoop.version}"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+*Note* : https is the secure version of HTTP, the protocol over which data is 
sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can 
configure it to http.
+
+
+---+++Dashboard
+
+Once Falcon server is started, you can view the status of Falcon entities 
using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this 
user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---++Running Examples using embedded package
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start
+</verbatim>
+Make sure the Hadoop and Oozie endpoints are according to your setup in
+examples/entity/filesystem/standalone-cluster.xml
+The cluster locations,staging and working dirs, MUST be created prior to 
submitting a cluster entity to Falcon.
+*staging* must have 777 permissions and the parent dirs must have execute 
permissions
+*working* must have 755 permissions and the parent dirs must have execute 
permissions
+<verbatim>
+$bin/falcon entity -submit -type cluster -file 
examples/entity/filesystem/standalone-cluster.xml
+</verbatim>
+Submit input and output feeds:
+<verbatim>
+$bin/falcon entity -submit -type feed -file 
examples/entity/filesystem/in-feed.xml
+$bin/falcon entity -submit -type feed -file 
examples/entity/filesystem/out-feed.xml
+</verbatim>
+Set-up workflow for the process:
+<verbatim>
+$hadoop fs -put examples/app /
+</verbatim>
+Submit and schedule the process:
+<verbatim>
+$bin/falcon entity -submitAndSchedule -type process -file 
examples/entity/filesystem/oozie-mr-process.xml
+$bin/falcon entity -submitAndSchedule -type process -file 
examples/entity/filesystem/pig-process.xml
+</verbatim>
+Generate input data:
+<verbatim>
+$examples/data/generate.sh <<hdfs endpoint>>
+</verbatim>
+Get status of instances:
+<verbatim>
+$bin/falcon instance -status -type process -name oozie-mr-process -start 
2013-11-15T00:05Z -end 2013-11-15T01:00Z
+</verbatim>
+
+HCat based example entities are in examples/entity/hcat.
+
+
+---+++Stopping Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-stop
+</verbatim>

Modified: falcon/trunk/general/src/site/twiki/EntitySpecification.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/EntitySpecification.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/EntitySpecification.twiki (original)
+++ falcon/trunk/general/src/site/twiki/EntitySpecification.twiki Mon Feb 15 
05:48:00 2016
@@ -70,7 +70,7 @@ Path is the hdfs path for each location.
 Falcon would use the location to do intermediate processing of entities in 
hdfs and hence Falcon
 should have read/write/execute permission on these locations.
 These locations MUST be created prior to submitting a cluster entity to Falcon.
-*staging* should have atleast 755 permissions and is a mandatory location .The 
parent dirs must have execute permissions so multiple
+*staging* should have 777 permissions and is a mandatory location .The parent 
dirs must have execute permissions so multiple
 users can write to this location. *working* must have 755 permissions and is a 
optional location.
 If *working* is not specified, falcon creates a sub directory in the *staging* 
location with 755 perms.
 The parent dir for *working* must have execute permissions so multiple
@@ -98,6 +98,61 @@ A key-value pair, which are propagated t
 Ideally JMS impl class name of messaging engine (brokerImplClass) 
 should be defined here.
 
+---++ Datasource Specification
+
+The datasource entity contains connection information required to connect to a 
data source like MySQL database.
+The datasource XSD specification is available here:
+A datasource contains read and write interfaces which are used by Falcon to 
import or export data from or to
+datasources respectively. A datasource is referenced by feeds which are 
on-boarded to Falcon by its name.
+
+Following are the tags defined in a datasource.xml:
+
+<verbatim>
+<datasource colo="west-coast" description="Customer database on west coast" 
type="mysql"
+ name="test-hsql-db" xmlns="uri:falcon:datasource:0.1" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
+</verbatim>
+
+The colo specifies the colo to which the datasource belongs to and name is the 
name of the datasource which has to
+be unique.
+
+---+++ Interfaces
+
+A datasource has two interfaces as described below:
+<verbatim>
+    <interface type="readonly" endpoint="jdbc:hsqldb:localhost/db"/>
+</verbatim>
+
+A readonly interface specifies the endpoint and protocol to connect to a 
datasource.
+This would be used in the context of import from datasource into HDFS.
+
+<verbatim>
+<interface type="write" endpoint="jdbc:hsqldb:localhost/db1">
+</verbatim>
+
+A write interface specifies the endpoint and protocol to to write to the 
datasource.
+Falcon uses this interface to export data from hdfs to datasource.
+
+<verbatim>
+<credential type="password-text">
+    <userName>SA</userName>
+    <passwordText></passwordText>
+</credential>
+</verbatim>
+
+
+A credential is associated with an interface (read or write) providing user 
name and password to authenticate
+to the datasource.
+
+<verbatim>
+<credential type="password-text">
+     <userName>SA</userName>
+     <passwordFile>hdfs-file-path</passwordText>
+</credential>
+</verbatim>
+
+The credential can be specified via a password file present in the HDFS. This 
file should only be accessible by
+the user.
+
 ---++ Feed Specification
 The Feed XSD specification is available here.
 A Feed defines various attributes of feed like feed location, frequency, 
late-arrival handling and retention policies.
@@ -244,6 +299,35 @@ expressions like frequency. slaLow is in
 availability SLAs. slaHigh is intended to serve for reporting the feeds which 
missed their SLAs. SLAs are relative to
 feed instance time.
 
+---+++ Import
+
+<verbatim>
+<import>
+    <source name="test-hsql-db" tableName="customer">
+        <extract type="full">
+            <mergepolicy>snapshot</mergepolicy>
+         </extract>
+         <fields>
+            <includes>
+                <field>id</field>
+                <field>name</field>
+            </includes>
+         </fields>
+    </source>
+    <arguments>
+        <argument name="--split-by" value="id"/>
+        <argument name="--num-mappers" value="2"/>
+    </arguments>
+</import>
+
+A feed can have an import policy associated with it. The souce name specified 
the datasource reference to the
+datasource entity from which the data will be imported to HDFS. The tableName 
spcified the table or topic to be
+imported from the datasource. The extract type specifies the pull mechanism 
(full or
+incremental extract). Full extract method extracts all the data from the 
datasource. The incremental extraction
+method feature implementation is in progress. The mergeplocy determines how 
the data is to be layed out on HDFS.
+The snapshot layout creates a snapshot of the data on HDFS using the feed's 
location specification. Fields is used
+to specify the projection columns. Feed import from database underneath uses 
sqoop to achieve the task. Any advanced
+Sqoop options can be specified via the arguments.
 
 ---+++ Late Arrival
 
@@ -256,6 +340,18 @@ upto 8 hours then late-arrival's cut-off
 
 *Note:* This will only apply for !FileSystem storage but not Table storage 
until a future time.
 
+
+---+++ Email Notification
+
+<verbatim>
+    <notification type="email" to="[email protected]"/>
+</verbatim>
+Specifying the notification element with "type" property allows users to 
receive email notification when a scheduled feed instance completes.
+Multiple recipients of an email can be provided as comma separated addresses 
with "to" property.
+To send email notification ensure that SMTP parameters are defined in Falcon 
startup.properties.
+Refer to [[FalconEmailNotification][Falcon Email Notification]] for more 
details.
+
+
 ---+++ ACL
 
 A feed has ACL (Access Control List) useful for implementing permission 
requirements
@@ -280,6 +376,13 @@ permission indicates the permission.
         <property name="parallel" value="3"/>
         <property name="maxMaps" value="8"/>
         <property name="mapBandwidth" value="1"/>
+        <property name="overwrite" value="true"/>
+        <property name="ignoreErrors" value="false"/>
+        <property name="skipChecksum" value="false"/>
+        <property name="removeDeletedFiles" value="true"/>
+        <property name="preserveBlockSize" value="true"/>
+        <property name="preserveReplicationNumber" value="true"/>
+        <property name="preservePermission" value="true"/>
         <property name="order" value="LIFO"/>
     </properties>
 </verbatim>
@@ -288,9 +391,59 @@ available to user to specify the Hadoop
 "timeout", "parallel" and "order" are other special properties which decides 
replication instance's timeout value while
 waiting for the feed instance, parallel decides the concurrent replication 
instances that can run at any given time and
 order decides the execution order for replication instances like FIFO, LIFO 
and LAST_ONLY.
-"maxMaps" represents the maximum number of maps used during replication. 
"mapBandwidth" represents the bandwidth in MB/s
-used by each mapper during replication.
- 
+DistCp options can be passed as custom properties, which will be propagated to 
the DistCp tool. "maxMaps" represents
+the maximum number of maps used during replication. "mapBandwidth" represents 
the bandwidth in MB/s
+used by each mapper during replication. "overwrite" represents overwrite 
destination during replication.
+"ignoreErrors" represents ignore failures not causing the job to fail during 
replication. "skipChecksum" represents
+bypassing checksum verification during replication. "removeDeletedFiles" 
represents deleting the files existing in the
+destination but not in source during replication. "preserveBlockSize" 
represents preserving block size during
+replication. "preserveReplicationNumber" represents preserving replication 
number during replication.
+"preservePermission" represents preserving permission during
+
+
+---+++ Lifecycle
+<verbatim>
+
+<lifecycle>
+    <retention-stage>
+        <frequency>hours(10)</frequency>
+        <queue>reports</queue>
+        <priority>NORMAL</priority>
+        <properties>
+            <property name="retention.policy.agebaseddelete.limit" 
value="hours(9)"></property>
+        </properties>
+    </retention-stage>
+</lifecycle>
+
+</verbatim>
+
+lifecycle tag is the new way to define various stages of a feed's lifecycle. 
In the example above we have defined a
+retention-stage using lifecycle tag. You may define lifecycle at global level 
or a cluster level or both. Cluster level
+configuration takes precedence and falcon falls back to global definition if 
cluster level specification is missing.
+
+
+----++++ Retention Stage
+As of now there are two ways to specify retention. One is through the 
<retention> tag in the cluster and another is the
+new way through <retention-stage> tag in <lifecycle> tag. If both are defined 
for a feed, then the lifecycle tag will be
+considered effective and falcon will ignore the <retention> tag in the 
cluster. If there is an invalid configuration of
+retention-stage in lifecycle tag, then falcon will *NOT* fall back to 
retention tag even if it is defined and will
+throw validation error.
+
+In this new method of defining retention you can specify the frequency at 
which the retention should occur, you can
+also define the queue and priority parameters for retention jobs. The default 
behavior of retention-stage is same as
+the existing one which is to delete all instances corresponding to 
instance-time earlier than the duration provided in
+"retention.policy.agebaseddelete.limit"
+
+Property "retention.policy.agebaseddelete.limit" is a mandatory property and 
must contain a valid duration e.g. "hours(1)"
+Retention frequency is not a mandatory parameter. If user doesn't specify the 
frequency in the retention stage then
+it doesn't fallback to old retention policy frequency. Its default value is 
set to 6 hours if feed frequency is less
+than 6 hours else its set to feed frequency as retention shouldn't be more 
frequent than data availability to avoid
+wastage of compute resources.
+
+In future, we will allow more customisation like customising how to choose 
instances to be deleted through this method.
+
+
+
 ---++ Process Specification
 A process defines configuration for a workflow. A workflow is a directed 
acyclic graph(DAG) which defines the job for the workflow engine. A process 
definition defines  the configurations required to run the workflow job. For 
example, process defines the frequency at which the workflow should run, the 
clusters on which the workflow should run, the inputs and outputs for the 
workflow, how the workflow failures should be handled, how the late inputs 
should be handled and so on.  
 
@@ -657,10 +810,12 @@ Syntax:
 </process>
 </verbatim>
 
-queueName and jobPriority are special properties, which when present are used 
by the Falcon's launcher job, the same property is also available in workflow 
which can be used to propagate to pig or M/R job.
+The following are some special properties, which when present are used by the 
Falcon's launcher job, the same property is also available in workflow which 
can be used to propagate to pig or M/R job.
 <verbatim>
         <property name="queueName" value="hadoopQueue"/>
         <property name="jobPriority" value="VERY_HIGH"/>
+        <!-- This property is used to turn off JMS notifications for this 
process. JMS notifications are enabled by default. -->
+        <property name="userJMSNotificationEnabled" value="false"/>
 </verbatim>
 
 ---+++ Workflow
@@ -673,7 +828,7 @@ be in lib folder inside the workflow pat
 The properties defined in the cluster and cluster properties(nameNode and 
jobTracker) will also
 be available for the workflow.
 
-There are 2 engines supported today.
+There are 3 engines supported today.
 
 ---++++ Oozie
 
@@ -742,7 +897,7 @@ Feeds with Hive table storage will send
 <verbatim>$input_filter</verbatim>
 
 ---+++ Retry
-Retry policy defines how the workflow failures should be handled. Two retry 
policies are defined: backoff and exp-backoff(exponential backoff). Depending 
on the delay and number of attempts, the workflow is re-tried after specific 
intervals.
+Retry policy defines how the workflow failures should be handled. Three retry 
policies are defined: periodic, exp-backoff(exponential backoff) and final. 
Depending on the delay and number of attempts, the workflow is re-tried after 
specific intervals.
 Syntax:
 <verbatim>
 <process name="[process name]">
@@ -756,7 +911,7 @@ Examples:
 <verbatim>
 <process name="sample-process">
 ...
-    <retry policy="backoff" delay="minutes(10)" attempts="3"/>
+    <retry policy="periodic" delay="minutes(10)" attempts="3"/>
 ...
 </process>
 </verbatim>
@@ -806,6 +961,16 @@ This late handling specifies that late d
 
 *Note:* This is only supported for !FileSystem storage but not Table storage 
at this point.
 
+---+++ Email Notification
+
+<verbatim>
+    <notification type="email" to="bob@@xyz.com"/>
+</verbatim>
+Specifying the notification element with "type" property allows users to 
receive email notification when a scheduled process instance completes.
+Multiple recipients of an email can be provided as comma separated addresses 
with "to" property.
+To send email notification ensure that SMTP parameters are defined in Falcon 
startup.properties.
+Refer to [[FalconEmailNotification][Falcon Email Notification]] for more 
details.
+
 ---+++ ACL
 
 A process has ACL (Access Control List) useful for implementing permission 
requirements

Modified: falcon/trunk/general/src/site/twiki/FalconCLI.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconCLI.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconCLI.twiki (original)
+++ falcon/trunk/general/src/site/twiki/FalconCLI.twiki Mon Feb 15 05:48:00 2016
@@ -2,12 +2,36 @@
 
 FalconCLI is a interface between user and Falcon. It is a command line utility 
provided by Falcon. FalconCLI supports Entity Management, Instance Management 
and Admin operations.There is a set of web services that are used by FalconCLI 
to interact with Falcon.
 
+---++Common CLI Options
+
+---+++Falcon URL
+
+Optional -url option indicating the URL of the Falcon system to run the 
command against can be provided.  If not mentioned it will be picked from the 
system environment variable FALCON_URL. If FALCON_URL is not set then it will 
be picked from client.properties file. If the option is not
+provided and also not set in client.properties, Falcon CLI will fail.
+
+---+++Proxy user support
+
+The -doAs option allows the current user to impersonate other users when 
interacting with the Falcon system. The current user must be configured as a 
proxyuser in the Falcon system. The proxyuser configuration may restrict from
+which hosts a user may impersonate users, as well as users of which groups can 
be impersonated.
+
+<a href="./FalconDocumentation.html#Proxyuser_support">Proxyuser support 
described here.</a>
+
+---+++Debug Mode
+
+If you export FALCON_DEBUG=true then the Falcon CLI will output the Web 
Services API details used by any commands you execute. This is useful for 
debugging purposes to or see how the Falcon CLI works with the WS API.
+Alternately, you can specify '-debug' through the CLI arguments to get the 
debug statements.
+Example:
+$FALCON_HOME/bin/falcon entity -submit -type cluster -file 
/cluster/definition.xml -debug
+
 ---++Entity Management Operations
 
 ---+++Submit
 
 Submit option is used to set up entity definition.
 
+Usage:
+$FALCON_HOME/bin/falcon entity -submit -type [cluster|datasource|feed|process] 
-file <entity-definition.xml>
+
 Example: 
 $FALCON_HOME/bin/falcon entity -submit -type cluster -file 
/cluster/definition.xml
 
@@ -20,6 +44,8 @@ Once submitted, an entity can be schedul
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [process|feed] -name <<name>> -schedule
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips 
oozie dryrun.
+
 Example:
 $FALCON_HOME/bin/falcon entity  -type process -name sampleProcess -schedule
 
@@ -42,22 +68,22 @@ Usage:
 Delete removes the submitted entity definition for the specified entity and 
put it into the archive.
 
 Usage:
-$FALCON_HOME/bin/falcon entity  -type [cluster|feed|process] -name <<name>> 
-delete
+$FALCON_HOME/bin/falcon entity  -type [cluster|datasource|feed|process] -name 
<<name>> -delete
 
 ---+++List
 
 Entities of a particular type can be listed with list sub-command.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list
+$FALCON_HOME/bin/falcon entity -list
 
-Optional Args : -fields <<field1,field2>> -filterBy 
<<field1:value1,field2:value2>>
--tags <<tagkey=tagvalue,tagkey=tagvalue>> -nameseq <<namesubsequence>>
+Optional Args : -fields <<field1,field2>>
+-type <<[cluster|datasource|feed|process],[cluster|datasource|feed|process]>>
+-nameseq <<namesubsequence>> -tagkeys <<tagkeyword1,tagkeyword2>>
+-filterBy <<field1:value1,field2:value2>> -tags 
<<tagkey=tagvalue,tagkey=tagvalue>>
 -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/EntityList.html">Optional params described here.</a>
-
-
+<a href="./Restapi/EntityList.html">Optional params described here.</a>
 
 
 ---+++Summary
@@ -71,16 +97,18 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -filterBy <<field1:value1,field2:value2>> -tags 
<<tagkey=tagvalue,tagkey=tagvalue>>
 -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10 
-numInstances 7
 
-<a href="./restapi/EntitySummary.html">Optional params described here.</a>
+<a href="./Restapi/EntitySummary.html">Optional params described here.</a>
 
 ---+++Update
 
-Update operation allows an already submitted/scheduled entity to be updated. 
Cluster update is currently
-not allowed.
+Update operation allows an already submitted/scheduled entity to be updated. 
Cluster and datasource updates are
+currently not allowed.
 
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -update 
-file <<path_to_file>>
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips 
oozie dryrun.
+
 Example:
 $FALCON_HOME/bin/falcon entity -type process -name HourlyReportsGenerator 
-update -file /process/definition.xml
 
@@ -91,26 +119,30 @@ Force Update operation allows an already
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -touch
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips 
oozie dryrun.
+
 ---+++Status
 
 Status returns the current status of the entity.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> 
-status
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name 
<<name>> -status
 
 ---+++Dependency
 
-With the use of dependency option, we can list all the entities on which the 
specified entity is dependent. For example for a feed, dependency return the 
cluster name and for process it returns all the input feeds, output feeds and 
cluster names.
+With the use of dependency option, we can list all the entities on which the 
specified entity is dependent.
+For example for a feed, dependency return the cluster name and for process it 
returns all the input feeds,
+output feeds and cluster names.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> 
-dependency
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name 
<<name>> -dependency
 
 ---+++Definition
 
 Definition option returns the entity definition submitted earlier during 
submit step.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> 
-definition
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name 
<<name>> -definition
 
 
 ---+++Lookup
@@ -125,6 +157,54 @@ $FALCON_HOME/bin/falcon entity -type fee
 If you have multiple feeds with location as 
/data/projects/my-hourly/${YEAR}/${MONTH}/${DAY}/${HOUR} then this command will 
return all of them.
 
 
+---+++SLAAlert
+<verbatim>
+Since: 0.8
+</verbatim>
+
+This command lists all the feed instances which have missed sla and are still 
not available. If a feed instance missed
+sla but is now available, then it will not be reported in results. The purpose 
of this API is alerting and hence it
+ doesn't return feed instances which missed SLA but are available as they 
don't require any action.
+
+* Currently sla monitoring is supported only for feeds.
+
+* Option end is optional and will default to current time if missing.
+
+* Option name is optional, if provided only instances of that feed will be 
considered.
+
+Usage:
+
+*Example 1*
+
+*$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert  
-end 2016-05-03T00:00Z -colo local*
+
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T11:59Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:00Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:01Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:02Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:03Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:04Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:05Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:06Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:07Z, tags: 
Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:08Z, tags: 
Missed SLA Low
+
+
+Response: default/Success!
+
+Request Id: default/216978070@qtp-830047511-4 - 
f5a6c129-ab42-4feb-a2bf-c3baed356248
+
+*Example 2*
+
+*$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert  
-end 2016-05-03T00:00Z -colo local -name in*
+
+name: in, type: FEED, cluster: local, instanceTime: 2015-09-26T06:00Z, tags: 
Missed SLA High
+
+Response: default/Success!
+
+Request Id: default/1580107885@qtp-830047511-7 - 
f16cbc51-5070-4551-ad25-28f75e5e4cf2
+
+
 ---++Instance Management Options
 
 ---+++Kill
@@ -158,7 +238,7 @@ $FALCON_HOME/bin/falcon instance -type <
 
 Rerun option is used to rerun instances of a given process. On issuing a 
rerun, by default the execution resumes from the last failed node in the 
workflow. This option is valid only for process instances in terminal state, 
i.e. SUCCEEDED, KILLED or FAILED.
 If one wants to forcefully rerun the entire workflow, -force should be passed 
along with -rerun
-Additionally, you can also specify properties to override via a properties 
file.
+Additionally, you can also specify properties to override via a properties 
file and this will be prioritized over force option in case of contradiction.
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -rerun 
-start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-force] [-file 
<<properties file>>]
@@ -187,7 +267,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>>
 -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceStatus.html"> Optional params described here.</a>
+<a href="./Restapi/InstanceStatus.html"> Optional params described here.</a>
 
 ---+++List
 
@@ -196,7 +276,7 @@ If the instance is in WAITING state, mis
 
 Example : Suppose a process has 3 instance, one has succeeded,one is in 
running state and other one is waiting, the expected output is:
 
-{"status":"SUCCEEDED","message":"getStatus is 
successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"},
 {"instance":"2010-01-02T11:05Z","status":"WAITING"}]
+{"status":"SUCCEEDED","message":"getStatus is 
successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"},
 {"instance":"2010-01-02T11:05Z","status":"WAITING"}]}
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -list
@@ -205,7 +285,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder 
<<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceList.html">Optional params described here.</a>
+<a href="./Restapi/InstanceList.html">Optional params described here.</a>
 
 ---+++Summary
 
@@ -215,15 +295,16 @@ The unscheduled instances between the sp
 
 Example : Suppose a process has 3 instance, one has succeeded,one is in 
running state and other one is waiting, the expected output is:
 
-{"status":"SUCCEEDED","message":"getSummary is successful", "cluster": 
<<name>> [{"SUCCEEDED":"1"}, {"WAITING":"1"}, {"RUNNING":"1"}]}
+{"status":"SUCCEEDED","message":"getSummary is successful", 
instancesSummary:[{"cluster": <<name>> "map":[{"SUCCEEDED":"1"}, 
{"WAITING":"1"}, {"RUNNING":"1"}]}]}
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -summary
 
-Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
--colo <<colo>> -lifecycle <<lifecycles>>
+Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" 
-colo <<colo>>
+-filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>>
+-orderBy field -sortOrder <<sortOrder>>
 
-<a href="./restapi/InstanceSummary.html">Optional params described here.</a>
+<a href="./Restapi/InstanceSummary.html">Optional params described here.</a>
 
 ---+++Running
 
@@ -235,7 +316,7 @@ $FALCON_HOME/bin/falcon instance -type <
 Optional Args : -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy <<field>> -sortOrder 
<<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceRunning.html">Optional params described here.</a>
+<a href="./Restapi/InstanceRunning.html">Optional params described here.</a>
 
 ---+++FeedInstanceListing
 
@@ -247,7 +328,7 @@ $FALCON_HOME/bin/falcon instance -type f
 Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
 -colo <<colo>>
 
-<a href="./restapi/FeedInstanceListing.html">Optional params described 
here.</a>
+<a href="./Restapi/FeedInstanceListing.html">Optional params described 
here.</a>
 
 ---+++Logs
 
@@ -260,7 +341,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder 
<<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceLogs.html">Optional params described here.</a>
+<a href="./Restapi/InstanceLogs.html">Optional params described here.</a>
 
 ---+++LifeCycle
 
@@ -270,6 +351,14 @@ This can be used with instance managemen
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status 
-lifecycle <<lifecycletype>> -start "yyyy-MM-dd'T'HH:mm'Z'" -end 
"yyyy-MM-dd'T'HH:mm'Z'"
 
+---+++Triage
+
+Given a feed/process instance this command traces it's ancestors to find what 
all ancestors have failed. It's useful if
+lot of instances are failing in a pipeline as it then finds out the root cause 
of the pipeline being stuck.
+
+Usage:
+$FALCON_HOME/bin/falcon instance -triage -type <<feed/process>> -name <<name>> 
-start "yyyy-MM-dd'T'HH:mm'Z'"
+
 ---+++Params
 
 Displays the workflow params of a given instance. Where start time is 
considered as nominal time of that instance and end time won't be considered.
@@ -278,6 +367,41 @@ Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params 
-start "yyyy-MM-dd'T'HH:mm'Z'"
 
 
+
+---+++Dependency
+Display the dependent instances which are dependent on the given instance. For 
example for a given process instance it will
+list all the input feed instances(if any) and the output feed instances(if 
any).
+
+An example use case of this command is as follows:
+Suppose you find out that the data in a feed instance was incorrect and you 
need to figure out which all process instances
+consumed this feed instance so that you can reprocess them after correcting 
the feed instance. You can give the feed instance
+and it will tell you which process instance produced this feed and which all 
process instances consumed this feed.
+
+NOTE:
+1. instanceTime must be a valid instanceTime e.g. instanceTime of a feed 
should be in it's validity range on applicable clusters,
+ and it should be in the range of instances produced by the producer 
process(if any)
+
+2. For processes with inputs like latest() which vary with time the results 
are not guaranteed to be correct.
+
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params 
-instanceTime "yyyy-MM-dd'T'HH:mm'Z'"
+
+For example:
+$FALCON_HOME/bin/falcon instance -dependency -type feed -name out 
-instanceTime 2014-12-15T00:00Z
+name: producer, type: PROCESS, cluster: local, instanceTime: 
2014-12-15T00:00Z, tags: Output
+name: consumer, type: PROCESS, cluster: local, instanceTime: 
2014-12-15T00:03Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 
2014-12-15T00:04Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 
2014-12-15T00:02Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 
2014-12-15T00:05Z, tags: Input
+
+
+Response: default/Success!
+
+Request Id: default/1125035965@qtp-503156953-7 - 
447be0ad-1d38-4dce-b438-20f3de69b172
+
+
+<a href="./Restapi/InstanceDependency.html">Optional params described here.</a>
+
 ---++ Metadata Lineage Options
 
 ---+++Lineage
@@ -341,7 +465,7 @@ $FALCON_HOME/bin/falcon metadata -edge -
 
 Lists of all dimensions of given type. If the user provides optional param 
cluster, only the dimensions related to the cluster are listed.
 Usage:
-$FALCON_HOME/bin/falcon metadata -list -type 
[cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines]
+$FALCON_HOME/bin/falcon metadata -list -type 
[cluster_entity|datasource_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines|replication_metrics]
 
 Optional Args : -cluster <<cluster name>>
 
@@ -349,6 +473,17 @@ Example:
 $FALCON_HOME/bin/falcon metadata -list -type process_entity -cluster 
primary-cluster
 $FALCON_HOME/bin/falcon metadata -list -type tags
 
+
+To display replication metrics from recipe based replication process and from 
feed replication.
+Usage:
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics 
-process/-feed <entity name>
+Optional Args : -numResults <<value>>
+
+Example:
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process 
hdfs-replication
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics -feed 
fs-replication
+
+
 ---+++ Relations
 
 List all dimensions related to specified Dimension identified by 
dimension-type and dimension-name.

Modified: falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki (original)
+++ falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki Mon Feb 15 
05:48:00 2016
@@ -15,7 +15,10 @@
    * <a href="#Security">Security</a>
    * <a href="#Recipes">Recipes</a>
    * <a href="#Monitoring">Monitoring</a>
+   * <a href="#Email_Notification">Email Notification</a>
    * <a href="#Backwards_Compatibility">Backwards Compatibility 
Instructions</a>
+   * <a href="#Proxyuser_support">Proxyuser support</a>
+   * <a href="#ImportExport">Data Import and Export</a>
 
 ---++ Architecture
 
@@ -35,6 +38,8 @@ Falcon system has picked Oozie as the de
 other schedulers. Lot of the data processing in hadoop requires scheduling to 
be based on both data availability
 as well as time. Oozie currently supports these capabilities off the shelf and 
hence the choice.
 
+While the use of Oozie works reasonably well, there are scenarios where Oozie 
scheduling is proving to be a limiting factor. In its current form, Falcon 
relies on Oozie for both scheduling and for workflow execution, due to which 
the scheduling is limited to time based/cron based scheduling with additional 
gating conditions on data availability. Also, this imposes restrictions on 
datasets being periodic/cyclic in nature. In order to offer better scheduling 
capabilities, Falcon comes with its own native scheduler. Refer to 
[[FalconNativeScheduler][Falcon Native Scheduler]] for details.
+
 ---+++ Control flow
 Though the actual responsibility of the workflow is with the scheduler 
(Oozie), Falcon remains in the
 execution path, by subscribing to messages that each of the workflow may 
generate. When Falcon generates a
@@ -153,7 +158,7 @@ Examples:
 
 
 ---++ Entity Management actions
-All the following operation can also be done using 
[[./restapi/ResourceList][Falcon's RESTful API]].
+All the following operation can also be done using 
[[restapi/ResourceList][Falcon's RESTful API]].
 
 ---+++ Submit
 Entity submit action allows a new cluster/feed/process to be setup within 
Falcon. Submitted entity is not
@@ -252,17 +257,21 @@ feed/data xml in the following manner fo
 </verbatim>
 
 The 'limit' attribute can be specified in units of minutes/hours/days/months, 
and a corresponding numeric value can
-be attached to it. It essentially instructs the system to retain data spanning 
from the current moment to the time specified
-in the attribute spanning backwards in time. Any data beyond the limit 
(past/future) is erased from the system.
+be attached to it. It essentially instructs the system to retain data till the 
time specified
+in the attribute spanning backwards in time, from now. Any data older than 
that is erased from the system. By default,
+Falcon runs retention jobs up to the cluster validity end time. This causes 
the instances created within the endTime
+and "endTime - retentionLimit" to be retained forever. If the users do not 
want to retain any instances of the
+feed past the cluster validity end time, user should set property 
"falcon.retention.keep.instances.beyond.validity"
+to false in runtime.properties.
 
 With the integration of Hive, Falcon also provides retention for tables in 
Hive catalog.
 
 ---+++ Example:
 If retention period is 10 hours, and the policy kicks in at time 't', the data 
retained by system is essentially the
-one in range [t-10h, t]. Any data before t-10h and after t is removed from the 
system.
+one after or equal to t-10h . Any data before t-10h is removed from the system.
 
-The 'action' attribute can attain values of DELETE/ARCHIVE. Based upon the tag 
value, the data eligible for removal is either
-deleted/archived.
+The 'action' attribute can attain values of DELETE/ARCHIVE. Based upon the tag 
value, the data eligible for removal is
+either deleted/archived.
 
 ---+++ NOTE: Falcon 0.1/0.2 releases support Delete operation only
 
@@ -319,6 +328,16 @@ replication instance delays. If the freq
 instance will run every 2 hours and replicates data with an offset of 1 hour, 
i.e. at 09:00 UTC, feed instance which
 is eligible for replication is 08:00; and 11:00 UTC, feed instance of 10:00 
UTC is eligible and so on.
 
+If it is required to capture the feed replication metrics like TIMETAKEN, 
COPY, BYTESCOPIED, set the parameter "job.counter" to "true"
+in feed entity properties section. Captured metrics from instance will be 
populated to the GraphDB for display on UI.
+
+*Example:*
+<verbatim>
+<properties>
+        <property name="job.counter" value="true" />
+</properties>
+</verbatim>
+
 ---+++ Where is the feed path defined for File System Storage?
 
 It's defined in the feed xml within the location tag.
@@ -561,7 +580,7 @@ simple and basic. The falcon system look
 cut-off period. Then it uses a scheduled messaging framework, like the one 
available in Apache ActiveMQ or Java's !DelayQueue to schedule a message with a 
cut-off period, then after a cut-off period the message is dequeued and Falcon 
checks for changes in the feed data which is recorded in HDFS in latedata file 
by falcons "record-size" action, if it detects any changes then the workflow 
will be rerun with the new set of feed data.
 
 *Example:*
-The late rerun policy can be configured in the process definition.
+For a process entity, the late rerun policy can be configured in the process 
definition.
 Falcon supports 3 policies, periodic, exp-backoff and final.
 Delay specifies, how often the feed data should be checked for changes, also 
one needs to 
 explicitly set the feed names in late-input which needs to be checked for late 
data.
@@ -575,6 +594,16 @@ explicitly set the feed names in late-in
 *NOTE:* Feeds configured with table storage does not support late input data 
handling at this point. This will be
 made available in the near future.
 
+For a feed entity replication job, the default late data handling policy can 
be configured in the runtime.properties file.
+Since these properties are runtime.properties, they will take effect for all 
replication jobs completed subsequent to the change.
+<verbatim>
+  # Default configs to handle replication for late arriving feeds.
+  *.feed.late.allowed=true
+  *.feed.late.frequency=hours(3)
+  *.feed.late.policy=exp-backoff
+</verbatim>
+
+
 ---++ Idempotency
 All the operations in Falcon are Idempotent. That is if you make same request 
to the falcon server / prism again you will get a SUCCESSFUL return if it was 
SUCCESSFUL in the first attempt. For example, you submit a new process / feed 
and get SUCCESSFUL message return. Now if you run the same command / api 
request on same entity you will again get a SUCCESSFUL message. Same is true 
for other operations like schedule, kill, suspend and resume.
 Idempotency also by takes care of the condition when request is sent through 
prism and fails on one or more servers. For example prism is configured to send 
request to 3 servers. First user sends a request to SUBMIT a process on all 3 
of them, and receives a response SUCCESSFUL from all of them. Then due to some 
issue one of the servers goes down, and user send a request to schedule the 
submitted process. This time he will receive a response with PARTIAL status and 
a FAILURE message from the server that has gone down. If the users check he 
will find the process would have been started and running on the 2 SUCCESSFUL 
servers. Now the issue with server is figured out and it is brought up. Sending 
the SCHEDULE request again through prism will result in a SUCCESSFUL response 
from prism as well as other three servers, but this time PROCESS will be 
SCHEDULED only on the server which had failed earlier and other two will keep 
running as before. 
@@ -711,6 +740,38 @@ Recipes is detailed in [[Recipes][Recipe
 
 Monitoring and Operationalizing Falcon is detailed in 
[[Operability][Operability]].
 
+---++ Email Notification
+Notification for instance completion in Falcon is defined in 
[[FalconEmailNotification][Falcon Email Notification]].
+
 ---++ Backwards Compatibility
 
 Backwards compatibility instructions are [[Compatibility][detailed here.]]
+
+---++ Proxyuser support
+Falcon supports impersonation or proxyuser functionality (identical to Hadoop 
proxyuser capabilities and conceptually
+similar to Unix 'sudo').
+
+Proxyuser enables Falcon clients to submit entities on behalf of other users. 
Falcon will utilize Hadoop core's hadoop-auth
+module to implement this functionality.
+
+Because proxyuser is a powerful capability, Falcon provides the following 
restriction capabilities (similar to Hadoop):
+
+   * Proxyuser is an explicit configuration on per proxyuser user basis.
+   * A proxyuser user can be restricted to impersonate other users from a set 
of hosts.
+   * A proxyuser user can be restricted to impersonate users belonging to a 
set of groups.
+
+There are 2 configuration properties needed in runtime properties to set up a 
proxyuser:
+   * falcon.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where 
the user #USER# can impersonate other users.
+   * falcon.service.ProxyUserService.proxyuser.#USER#.groups: groups the users 
being impersonated by user #USER# must belong to.
+
+If these configurations are not present, impersonation will not be allowed and 
connection will fail. If more lax security is preferred,
+the wildcard value * may be used to allow impersonation from any host or of 
any user, although this is recommended only for testing/development.
+
+-doAs option via  CLI or doAs query parameter can be appended if using API to 
enable impersonation.
+
+---++ ImportExport
+
+Data Import and Export is detailed in [[ImportExport][Data Import and Export]].
+
+
+

Added: falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki (added)
+++ falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki Mon Feb 
15 05:48:00 2016
@@ -0,0 +1,29 @@
+---++Falcon Email Notification
+
+Falcon Email notification allows sending email notifications when scheduled 
feed/process instances complete.
+Email notification in feed/process entity can be defined as follows:
+<verbatim>
+<process name="[process name]">
+    ...
+    <notification type="email" to="[email protected],[email protected]"/>
+    ...
+</process>
+</verbatim>
+
+   *  *type*    - specifies about the type of notification. *Note:* Currently 
"email" notification type is supported.
+   *  *to*  - specifies the address to send notifications to; multiple 
recipients may be provided as a comma-separated list.
+
+
+Falcon email notification requires some SMTP server configuration to be 
defined in startup.properties. Following are the values
+it looks for:
+   * *falcon.email.smtp.host*   - The host where the email action may find the 
SMTP server (localhost by default).
+   * *falcon.email.smtp.port*   - The port to connect to for the SMTP server 
(25 by default).
+   * *falcon.email.from.address*    - The from address to be used for mailing 
all emails (falcon@localhost by default).
+   * *falcon.email.smtp.auth*   - Boolean property that specifies if 
authentication is to be done or not. (false by default).
+   * *falcon.email.smtp.user*   - If authentication is enabled, the username 
to login as (empty by default).
+   * *falcon.email.smtp.password*   - If authentication is enabled, the 
username's password (empty by default).
+
+
+
+Also ensure that email notification plugin is enabled in startup.properties to 
send email notifications:
+   * *monitoring.plugins*   - 
org.apache.falcon.plugin.EmailNotificationPlugin,org.apache.falcon.plugin.DefaultMonitoringPlugin
\ No newline at end of file

Added: falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki (added)
+++ falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki Mon Feb 15 
05:48:00 2016
@@ -0,0 +1,213 @@
+---+ Falcon Native Scheduler
+
+---++ Overview
+Falcon has been using Oozie as its scheduling engine.  While the use of Oozie 
works reasonably well, there are scenarios where Oozie scheduling is proving to 
be a limiting factor. In its current form, Falcon relies on Oozie for both 
scheduling and for workflow execution, due to which the scheduling is limited 
to time based/cron based scheduling with additional gating conditions on data 
availability. Also, this imposes restrictions on datasets being periodic in 
nature. In order to offer better scheduling capabilities, Falcon comes with its 
own native scheduler. 
+
+---++ Capabilities
+The native scheduler will offer the capabilities offered by Oozie co-ordinator 
and more. The native scheduler will be built and released over the next few 
releases of Falcon giving users an opportunity to use it and provide feedback.
+
+Currently, the native scheduler offers the following capabilities:
+   1. Submit and schedule a Falcon process that runs periodically (without 
data dependency) - It could be a PIG script, oozie workflow, Hive (all the 
engine types currently supported).
+   1. Monitor/Query/Modify the scheduled process - All applicable entity APIs 
and instance APIs should work as it does now.  Falcon provides data management 
functions for feeds declaratively. It allows users to represent feed locations 
as time-based partition directories on HDFS containing files.
+
+*NOTE: Execution order is FIFO. LIFO and LAST_ONLY are not supported yet.*
+
+In the near future, Falcon scheduler will provide feature parity with Oozie 
scheduler and in subsequent releases will provide the following features:
+   * Periodic, cron-based, calendar-based scheduling.
+   * Data availability based scheduling.
+   * External trigger/notification based scheduling.
+   * Support for periodic/a-periodic datasets.
+   * Support for optional/mandatory datasets. Option to specify 
minumum/maximum/exactly-N instances of data to consume.
+   * Handle dependencies across entities during re-run.
+
+---++ Configuring Native Scheduler
+You can enable native scheduler by making changes to 
__$FALCON_HOME/conf/startup.properties__ as follows. You will need to restart 
Falcon Server for the changes to take effect.
+<verbatim>
+*.dag.engine.impl=org.apache.falcon.workflow.engine.OozieDAGEngine
+*.application.services=org.apache.falcon.security.AuthenticationInitializationService,\
+                        
org.apache.falcon.workflow.WorkflowJobEndNotificationService, \
+                        org.apache.falcon.service.ProcessSubscriberService,\
+                        org.apache.falcon.service.FeedSLAMonitoringService,\
+                        org.apache.falcon.service.LifecyclePolicyMap,\
+                        
org.apache.falcon.state.store.service.FalconJPAService,\
+                        org.apache.falcon.entity.store.ConfigurationStore,\
+                        org.apache.falcon.rerun.service.RetryService,\
+                        org.apache.falcon.rerun.service.LateRunService,\
+                        org.apache.falcon.metadata.MetadataMappingService,\
+                        org.apache.falcon.service.LogCleanupService,\
+                        org.apache.falcon.service.GroupsService,\
+                        org.apache.falcon.service.ProxyUserService,\
+                        
org.apache.falcon.notification.service.impl.JobCompletionService,\
+                        
org.apache.falcon.notification.service.impl.SchedulerService,\
+                        
org.apache.falcon.notification.service.impl.AlarmService,\
+                        
org.apache.falcon.notification.service.impl.DataAvailabilityService,\
+                        org.apache.falcon.execution.FalconExecutionService
+</verbatim>
+
+---+++ Making the Native Scheduler the default scheduler
+To ensure backward compatibility, even when the native scheduler is enabled, 
the default scheduler is still Oozie. This means users will be scheduling 
entities on Oozie scheduler, by default. They will need to explicitly specify 
the scheduler as native, if they wish to schedule entities using native 
scheduler. 
+
+<a href="#Scheduling_new_entities_on_Native_Scheduler">This section</a> has 
more details on how to schedule on either of the schedulers. 
+
+If you wish to make the Falcon Native Scheduler your default scheduler and 
remove Oozie as the scheduler, set the following property in 
__$FALCON_HOME/conf/startup.properties__
+<verbatim>
+## If you wish to use Falcon native scheduler as your default scheduler, set 
the workflow engine to FalconWorkflowEngine instead of OozieWorkflowEngine. ##
+*.workflow.engine.impl=org.apache.falcon.workflow.engine.FalconWorkflowEngine
+</verbatim>
+
+---+++ Configuring the state store for Native Scheduler
+You can configure statestore by making changes to 
__$FALCON_HOME/conf/statestore.properties__ as follows. You will need to 
restart Falcon Server for the changes to take effect.
+
+Falcon Server needs to maintain state of the entities and instances in a 
persistent store for the system to be recoverable. Since Prism only federates, 
it does not need to maintain any state information. Following properties need 
to be set in statestore.properties of Falcon Servers:
+<verbatim>
+######### StateStore Properties #####
+*.falcon.state.store.impl=org.apache.falcon.state.store.jdbc.JDBCStateStore
+*.falcon.statestore.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
+*.falcon.statestore.jdbc.url=jdbc:derby:data/falcon.db
+# StateStore credentials file where username,password and other properties can 
be stored securely.
+# Set this credentials file permission 400 and make sure user who starts 
falcon should only have read permission.
+# Give Absolute path to credentials file along with file name or put in 
classpath with file name statestore.credentials.
+# Credentials file should be present either in given location or class path, 
otherwise falcon won't start.
+*.falcon.statestore.credentials.file=
+*.falcon.statestore.jdbc.username=sa
+*.falcon.statestore.jdbc.password=
+*.falcon.statestore.connection.data.source=org.apache.commons.dbcp.BasicDataSource
+# Maximum number of active connections that can be allocated from this pool at 
the same time.
+*.falcon.statestore.pool.max.active.conn=10
+*.falcon.statestore.connection.properties=
+# Indicates the interval (in milliseconds) between eviction runs.
+*.falcon.statestore.validate.db.connection.eviction.interval=300000
+## The number of objects to examine during each run of the idle object evictor 
thread.
+*.falcon.statestore.validate.db.connection.eviction.num=10
+## Creates Falcon DB.
+## If set to true, it creates the DB schema if it does not exist. If the DB 
schema exists is a NOP.
+## If set to false, it does not create the DB schema. If the DB schema does 
not exist it fails start up.
+*.falcon.statestore.create.db.schema=true
+</verbatim> 
+
+The _*.falcon.statestore.jdbc.url_ property in statestore.properties 
determines the DB and data location. All other properties are common across 
RDBMS.
+
+*NOTE : Although multiple Falcon Servers can share a DB (not applicable for 
Derby DB), it is recommended that you have different DBs for different Falcon 
Servers for better performance.*
+
+You will need to create the state DB and tables before starting the Falcon 
Server. To create tables, a tool comes bundled with the Falcon installation. 
You can use the _falcon-db.sh_ script to create tables in the DB. The script 
needs to be run only for Falcon Servers and can be run by any user that has 
execute permission on the script. The script picks up the DB connection details 
from __$FALCON_HOME/conf/statestore.properties__. Ensure that you have granted 
the right privileges to the user mentioned in statestore.properties_, so the 
tables can be created.
+
+You can use the help command to get details on the sub-commands supported:
+<verbatim>
+./bin/falcon-db.sh help
+Hadoop home is set, adding libraries from 
'/Users/pallavi.rao/falcon/hadoop-2.6.0/bin/hadoop classpath' into falcon 
classpath
+usage: 
+      Falcon DB initialization tool currently supports Derby DB/ Mysql
+
+      falcondb help : Display usage for all commands or specified command
+
+      falcondb version : Show Falcon DB version information
+
+      falcondb create <OPTIONS> : Create Falcon DB schema
+                      -run             Confirmation option regarding DB schema 
creation/upgrade
+                      -sqlfile <arg>   Generate SQL script instead of 
creating/upgrading the DB
+                                       schema
+
+      falcondb upgrade <OPTIONS> : Upgrade Falcon DB schema
+                       -run             Confirmation option regarding DB 
schema creation/upgrade
+                       -sqlfile <arg>   Generate SQL script instead of 
creating/upgrading the DB
+                                        schema
+
+</verbatim>
+Currently, MySQL and Derby are supported as state stores. We may extend 
support to other DBs in the future. Falcon has been tested against MySQL v5.5. 
If you are using MySQL ensure you also copy mysql-connector-java-<version>.jar 
under __$FALCON_HOME/server/webapp/falcon/WEB-INF/lib__ and 
__$FALCON_HOME/client/lib__
+
+---++++ Using Derby as the State Store
+Using Derby is ideal for QA and staging setup. Falcon comes bundled with a 
Derby connector and no explicit setup is required (although you can set it up) 
in terms creating the DB or tables.
+For example,
+ <verbatim> *.falcon.statestore.jdbc.url=jdbc:derby:data/falcon.db;create=true 
</verbatim>
+
+ tells Falcon to use the Derby JDBC connector, with data directory, 
$FALCON_HOME/data/ and DB name 'falcon'. If _create=true_ is specified, you 
will not need to create a DB up front; a database will be created if it does 
not exist.
+
+---++++ Using MySQL as the State Store
+The jdbc.url property in statestore.properties determines the DB and data 
location.
+For example,
+ <verbatim> *.falcon.statestore.jdbc.url=jdbc:mysql://localhost:3306/falcon 
</verbatim>
+
+ tells Falcon to use the MySQL JDBC connector, which is accessible 
@localhost:3306, with DB name 'falcon'.
+
+---++ Scheduling new entities on Native Scheduler
+To schedule an entity (currently only process is supported) using the native 
scheduler, you need to specify the scheduler in the schedule command as shown 
below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule 
-properties falcon.scheduler:native
+</verbatim>
+
+If Oozie is configured as the default scheduler, you can skip the scheduler 
option or explicitly set it to _oozie_, as shown below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule
+OR
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule 
-properties falcon.scheduler:oozie
+</verbatim>
+
+If the native scheduler is configured as the default scheduler, then, you can 
omit the scheduler option, as shown below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule 
+</verbatim>
+
+---++ Migrating entities from Oozie Scheduler to Native Scheduler
+Currently, user will have to delete and re-create entities in order to move 
across schedulers. Attempting to schedule an already scheduled entity on a 
different scheduler will result in an error. Note that the history of instances 
prior to scheduling on native scheduler will not be available via the instance 
APIs. However, user can retrieve that information using metadata APIs. Native 
scheduler must be enabled before migrating entities to native scheduler.
+
+<a href="#Configuring_Native_Scheduler">Configuring Native Scheduler</a> has 
more details on how to enable native scheduler.
+
+---+++ Migrating from Oozie to Native Scheduler
+   * Delete the entity (process). 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -name <process name> 
-delete </verbatim>
+   * Submit the entity (process) with start time from where the Oozie 
scheduler left off. 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -submit <path to 
process xml> </verbatim>
+   * Schedule the entity on native scheduler. 
+<verbatim> $FALCON_HOME/bin/falcon entity -type process -name <process name> 
-schedule -properties falcon.scheduler:native </verbatim>
+
+---+++ Reverting to Oozie from Native Scheduler
+   * Delete the entity (process). 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -name <process name> 
-delete </verbatim>
+   * Submit the entity (process) with start time from where the Native 
scheduler left off. 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -submit <path to 
process xml> </verbatim>
+   * Schedule the entity on the default scheduler (Oozie).
+ <verbatim> $FALCON_HOME/bin/falcon entity -type process -name <process name> 
-schedule </verbatim>
+
+---+++ Differences in API responses between Oozie and Native Scheduler
+Most API responses are similar whether the entity is scheduled via Oozie or 
via Native scheduler. However, there are a few exceptions and those are listed 
below.
+---++++ Rerun API
+When a user performs a rerun using Oozie scheduler, Falcon directly reruns the 
workflow on Oozie and the instance will be moved to 'RUNNING'.
+
+Example response:
+<verbatim>
+$ falcon instance -rerun processMerlinOozie -start 2016-01-08T12:13Z -end 
2016-01-08T12:15Z
+Consolidated Status: SUCCEEDED
+
+Instances:
+Instance               Cluster         SourceCluster           Status          
Start           End             Details                                 Log
+-----------------------------------------------------------------------------------------------
+2016-01-08T12:13Z      ProcessMultipleClustersTest-corp-9706f068       -       
RUNNING 2016-01-08T13:03Z       2016-01-08T13:03Z       -       
http://8RPCG32.corp.inmobi.com:11000/oozie?job=0001811-160104160825636-oozie-oozi-W
+2016-01-08T12:13Z      ProcessMultipleClustersTest-corp-0b270a1d       -       
RUNNING 2016-01-08T13:03Z       2016-01-08T13:03Z       -       
http://lda01:11000/oozie?job=0002247-160104115615658-oozie-oozi-W
+
+Additional Information:
+Response: ua1/RERUN
+ua2/RERUN
+Request Id: ua1/871377866@qtp-630572412-35 - 
7190c4c8-bacb-4639-8d48-c9e639f544da
+ua2/1554129706@qtp-536122141-13 - bc18127b-1bf8-4ea1-99e6-b1f10ba3a441
+</verbatim>
+
+However, when a user performs a rerun on native scheduler, the instance is 
scheduled again. This is done intentionally so as to not violate the number of 
instances running in parallel.  Hence, the user will see the status of the 
instance as 'READY'.
+
+Example response:
+<verbatim>
+$ falcon instance -rerun 
ProcessMultipleClustersTest-agregator-coord16-8f55f59b -start 2016-01-08T12:13Z 
-end 2016-01-08T12:15Z
+Consolidated Status: SUCCEEDED
+
+Instances:
+Instance               Cluster         SourceCluster           Status          
Start           End             Details                                 Log
+-----------------------------------------------------------------------------------------------
+2016-01-08T12:13Z      ProcessMultipleClustersTest-corp-9706f068       -       
READY   2016-01-08T13:03Z       2016-01-08T13:03Z       -       
http://8RPCG32.corp.inmobi.com:11000/oozie?job=0001812-160104160825636-oozie-oozi-W
+
+2016-01-08T12:13Z      ProcessMultipleClustersTest-corp-0b270a1d       -       
READY   2016-01-08T13:03Z       2016-01-08T13:03Z       -       
http://lda01:11000/oozie?job=0002248-160104115615658-oozie-oozi-W
+
+Additional Information:
+Response: ua1/RERUN
+ua2/RERUN
+Request Id: ua1/871377866@qtp-630572412-35 - 
8d118d4d-c0ef-4335-a9af-10364498ec4f
+ua2/1554129706@qtp-536122141-13 - c2a3fc50-8b05-47ce-9c85-ca432b96d923
+</verbatim>

Added: falcon/trunk/general/src/site/twiki/HDFSDR.twiki
URL: 
http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/HDFSDR.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/HDFSDR.twiki (added)
+++ falcon/trunk/general/src/site/twiki/HDFSDR.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,34 @@
+---+ HDFS DR Recipe
+---++ Overview
+Falcon supports HDFS DR recipe to replicate data from source cluster to 
destination cluster.
+
+---++ Usage
+---+++ Setup cluster definition.
+   <verbatim>
+    $FALCON_HOME/bin/falcon entity -submit -type cluster -file 
/cluster/definition.xml
+   </verbatim>
+
+---+++ Update recipes properties
+   Copy HDFS replication recipe properties, workflow and template file from 
$FALCON_HOME/data-mirroring/hdfs-replication to the accessible
+   directory path or to the recipe directory path (*falcon.recipe.path=<recipe 
directory path>*). *"falcon.recipe.path"* must be specified
+   in Falcon conf client.properties. Now update the copied recipe properties 
file with required attributes to replicate data from source cluster to
+   destination cluster for HDFS DR.
+
+---+++ Submit HDFS DR recipe
+
+   After updating the recipe properties file with required attributes in 
directory path or in falcon.recipe.path,
+   there are two ways of submitting the HDFS DR recipe:
+
+   * 1. Specify Falcon recipe properties file through recipe command line.
+   <verbatim>
+    $FALCON_HOME/bin/falcon recipe -name hdfs-replication -operation 
HDFS_REPLICATION
+    -properties /cluster/hdfs-replication.properties
+   </verbatim>
+
+   * 2. Use Falcon recipe path specified in Falcon conf client.properties .
+   <verbatim>
+    $FALCON_HOME/bin/falcon recipe -name hdfs-replication -operation 
HDFS_REPLICATION
+   </verbatim>
+
+
+*Note:* Recipe properties file, workflow file and template file name must 
match to the recipe name, it must be unique and in the same directory.

svn commit: r1730449 [1/3] - in /falcon/trunk: ./ general/ general/src/site/ general/src/site/twiki/ general/src/site/twiki/falconcli/ general/src/site/twiki/restapi/ releases/

Reply via email to