[07/38] falcon git commit: Falcon 0.11 release and site update

sandeeps Mon, 12 Mar 2018 05:55:00 -0700

http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Configuration.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/Configuration.twiki 
b/trunk/releases/0.11/src/site/twiki/Configuration.twiki
new file mode 100644
index 0000000..c686d48
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/Configuration.twiki
@@ -0,0 +1,461 @@
+---+Configuring Falcon
+
+By default config directory used by falcon is {package dir}/conf. To override 
this (to use the same conf with multiple
+falcon upgrades), set environment variable FALCON_CONF to the path of the conf 
dir.
+
+falcon-env.sh has been added to the falcon conf. This file can be used to set 
various environment variables that you
+need for you services.
+In addition you can set any other environment variables you might need. This 
file will be sourced by falcon scripts
+before any commands are executed. The following environment variables are 
available to set.
+
+<verbatim>
+# The java implementation to use. If JAVA_HOME is not found we expect java and 
jar to be in path
+#export JAVA_HOME=
+
+# any additional java opts you want to set. This will apply to both client and 
server operations
+#export FALCON_OPTS=
+
+# any additional java opts that you want to set for client only
+#export FALCON_CLIENT_OPTS=
+
+# java heap size we want to set for the client. Default is 1024MB
+#export FALCON_CLIENT_HEAP=
+
+# any additional opts you want to set for prism service.
+#export FALCON_PRISM_OPTS=
+
+# java heap size we want to set for the prism service. Default is 1024MB
+#export FALCON_PRISM_HEAP=
+
+# any additional opts you want to set for falcon service.
+#export FALCON_SERVER_OPTS=
+
+# java heap size we want to set for the falcon server. Default is 1024MB
+#export FALCON_SERVER_HEAP=
+
+# What is is considered as falcon home dir. Default is the base location of 
the installed software
+#export FALCON_HOME_DIR=
+
+# Where log files are stored. Default is logs directory under the base install 
location
+#export FALCON_LOG_DIR=
+
+# Where pid files are stored. Default is logs directory under the base install 
location
+#export FALCON_PID_DIR=
+
+# where the falcon active mq data is stored. Default is logs/data directory 
under the base install location
+#export FALCON_DATA_DIR=
+
+# Where do you want to expand the war file. By Default it is in /server/webapp 
dir under the base install dir.
+#export FALCON_EXPANDED_WEBAPP_DIR=
+
+# Any additional classpath elements to be added to the Falcon server/client 
classpath
+#export FALCON_EXTRA_CLASS_PATH=
+</verbatim>
+
+---++Advanced Configurations
+
+---+++Configuring Monitoring plugin to register catalog partitions
+Falcon comes with a monitoring plugin that registers catalog partition. This 
comes in really handy during migration from
+ filesystem based feeds to hcatalog based feeds.
+This plugin enables the user to de-couple the partition registration and 
assume that all partitions are already on
+hcatalog even before the migration, simplifying the hcatalog migration.
+
+By default this plugin is disabled.
+To enable this plugin and leverage the feature, there are 3 pre-requisites:
+<verbatim>
+In {package dir}/conf/startup.properties, add
+*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler
+
+In the cluster definition, ensure registry endpoint is defined.
+Ex:
+<interface type="registry" endpoint="thrift://localhost:1109" 
version="0.13.3"/>
+
+In the feed definition, ensure the corresponding catalog table is mentioned in 
feed-properties
+Ex:
+<properties>
+    <property name="catalog.table" 
value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR};
+    minute={MINUTE}"/>
+</properties>
+</verbatim>
+
+*NOTE : for Mac OS users*
+<verbatim>
+If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS 
(explained above).
+
+In  {package dir}/conf/falcon-env.sh uncomment the following line
+#export FALCON_SERVER_OPTS=
+
+and change it to look as below
+export FALCON_SERVER_OPTS="-Djava.awt.headless=true 
-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
+</verbatim>
+
+---+++Activemq
+* falcon server starts embedded active mq. To control this behaviour, set the 
following system properties using -D
+option in environment variable FALCON_OPTS:
+   * falcon.embeddedmq=<true/false> - Should server start embedded active mq, 
default true
+   * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
+   * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default 
{package dir}/logs/data
+
+---+++Falcon System Notifications
+
+Some Falcon features such as late data handling, retries, metadata service, 
depend on JMS notifications sent when the
+Oozie workflow completes. Falcon listens to Oozie notification via JMS. You 
need to enable Oozie JMS notification as
+explained below. Falcon post processing feature continues to only send user 
notifications so enabling Oozie
+JMS notification is important.
+
+*NOTE : If Oozie JMS notification is not enabled, the Falcon features such as 
failure retry, late data handling and metadata
+service will be disabled for all entities on the server.*
+
+---+++Enable Oozie JMS notification
+
+   * Please add/change the following properties in oozie-site.xml in the oozie 
installation dir.
+
+<verbatim>
+   <property>
+      <name>oozie.jms.producer.connection.properties</name>
+      
<value>java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://<activemq-host>:<port></value>
+    </property>
+
+   <property>
+      <name>oozie.service.EventHandlerService.event.listeners</name>
+      <value>org.apache.oozie.jms.JMSJobEventListener</value>
+   </property>
+
+   <property>
+      <name>oozie.service.JMSTopicService.topic.name</name>
+      <value>WORKFLOW=ENTITY.TOPIC,COORDINATOR=ENTITY.TOPIC</value>
+    </property>
+
+   <property>
+      <name>oozie.service.JMSTopicService.topic.prefix</name>
+      <value>FALCON.</value>
+    </property>
+
+    <!-- add org.apache.oozie.service.JMSAccessorService to the other existing 
services if any -->
+    <property>
+       <name>oozie.services.ext</name>
+       
<value>org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value>
+    </property>
+</verbatim>
+
+   * In falcon startup.properties, set JMS broker url to be the same as the 
one set in oozie-site.xml property
+   oozie.jms.producer.connection.properties (see above)
+
+<verbatim>
+   *.broker.url=tcp://<activemq-host>:<port>
+</verbatim>
+
+---+++Configuring Oozie for Falcon
+
+Falcon uses HCatalog for data availability notification when Hive tables are 
replicated. Make the following configuration
+changes to Oozie to ensure Hive table replication in Falcon:
+
+   * Stop the Oozie service on all Falcon clusters. Run the following commands 
on the Oozie host machine.
+
+<verbatim>
+su - $OOZIE_USER
+
+<oozie-install-dir>/bin/oozie-stop.sh
+
+where $OOZIE_USER is the Oozie user. For example, oozie.
+</verbatim>
+
+   * Copy each cluster's hadoop conf directory to a different location. For 
example, if you have two clusters, copy one to /etc/hadoop/conf-1 and the other 
to /etc/hadoop/conf-2.
+
+   * For each oozie-site.xml file, modify the 
oozie.service.HadoopAccessorService.hadoop.configurations property, specifying 
clusters, the RPC ports of the NameNodes, and HostManagers accordingly. For 
example, if Falcon connects to three clusters, specify:
+
+<verbatim>
+
+<property>
+     <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
+     
<value>*=/etc/hadoop/conf,$NameNode:$rpcPortNN=$hadoopConfDir1,$ResourceManager1:$rpcPortRM=$hadoopConfDir1,$NameNode2=$hadoopConfDir2,$ResourceManager2:$rpcPortRM=$hadoopConfDir2,$NameNode3
 :$rpcPortNN =$hadoopConfDir3,$ResourceManager3 :$rpcPortRM 
=$hadoopConfDir3</value>
+     <description>
+          Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the 
HOST:PORT of
+          the Hadoop service (JobTracker, HDFS). The wildcard '*' 
configuration is
+          used when there is no exact match for an authority. The 
HADOOP_CONF_DIR contains
+          the relevant Hadoop *-site.xml files. If the path is relative is 
looked within
+          the Oozie configuration directory; though the path can be absolute 
(i.e. to point
+          to Hadoop client conf/ directories in the local filesystem.
+     </description>
+</property>
+
+</verbatim>
+
+   * Add the following properties to the /etc/oozie/conf/oozie-site.xml file:
+
+<verbatim>
+
+<property>
+     <name>oozie.service.ProxyUserService.proxyuser.falcon.hosts</name>
+     <value>*</value>
+</property>
+
+<property>
+     <name>oozie.service.ProxyUserService.proxyuser.falcon.groups</name>
+     <value>*</value>
+</property>
+
+<property>
+     <name>oozie.service.URIHandlerService.uri.handlers</name>
+     <value>org.apache.oozie.dependency.FSURIHandler, 
org.apache.oozie.dependency.HCatURIHandler</value>
+</property>
+
+<property>
+     <name>oozie.services.ext</name>
+     <value>org.apache.oozie.service.JMSAccessorService, 
org.apache.oozie.service.PartitionDependencyManagerService,
+     org.apache.oozie.service.HCatAccessorService</value>
+</property>
+
+<!-- Coord EL Functions Properties -->
+
+<property>
+     
<name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name>
+     <value>now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo,
+         today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo,
+         
yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo,
+         
currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo,
+         
lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo,
+         
currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo,
+         
lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo,
+         
formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo,
+         latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo,
+         future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo
+     </value>
+</property>
+
+<property>
+     
<name>oozie.service.ELService.ext.functions.coord-action-create-inst</name>
+     <value>now=org.apache.oozie.extensions.OozieELExtensions#ph2_now_inst,
+         today=org.apache.oozie.extensions.OozieELExtensions#ph2_today_inst,
+         
yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday_inst,
+         
currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth_inst,
+         
lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth_inst,
+         
currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear_inst,
+         
lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear_inst,
+         latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo,
+         future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo,
+         
formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime,
+         user=org.apache.oozie.coord.CoordELFunctions#coord_user
+     </value>
+</property>
+
+<property>
+<name>oozie.service.ELService.ext.functions.coord-action-start</name>
+<value>
+now=org.apache.oozie.extensions.OozieELExtensions#ph2_now,
+today=org.apache.oozie.extensions.OozieELExtensions#ph2_today,
+yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday,
+currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth,
+lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth,
+currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear,
+lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear,
+latest=org.apache.oozie.coord.CoordELFunctions#ph3_coord_latest,
+future=org.apache.oozie.coord.CoordELFunctions#ph3_coord_future,
+dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn,
+instanceTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime,
+dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3_coord_dateOffset,
+formatTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_formatTime,
+user=org.apache.oozie.coord.CoordELFunctions#coord_user
+</value>
+</property>
+
+<property>
+     <name>oozie.service.ELService.ext.functions.coord-sla-submit</name>
+     <value>
+         
instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_fixed,
+         user=org.apache.oozie.coord.CoordELFunctions#coord_user
+     </value>
+</property>
+
+<property>
+     <name>oozie.service.ELService.ext.functions.coord-sla-create</name>
+     <value>
+         
instanceTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_nominalTime,
+         user=org.apache.oozie.coord.CoordELFunctions#coord_user
+     </value>
+</property>
+
+</verbatim>
+
+   * Copy the existing Oozie WAR file to <oozie-install-dir>/oozie.war. This 
will ensure that all existing items in the WAR file are still present after the 
current update.
+
+<verbatim>
+su - root
+cp $CATALINA_BASE/webapps/oozie.war <oozie-install-dir>/oozie.war
+
+where $CATALINA_BASE is the path for the Oozie web app. By default, 
$CATALINA_BASE is: <oozie-install-dir>
+</verbatim>
+
+   * Add the Falcon EL extensions to Oozie.
+
+Copy the extension JAR files provided with the Falcon Server to a temporary 
directory on the Oozie server. For example, if your standalone Falcon Server is 
on the same machine as your Oozie server, you can just copy the JAR files.
+
+<verbatim>
+
+mkdir /tmp/falcon-oozie-jars
+cp <falcon-install-dir>/oozie/ext/falcon-oozie-el-extension-<$version>.jar 
/tmp/falcon-oozie-jars
+cp /tmp/falcon-oozie-jars/falcon-oozie-el-extension-<$version>.jar 
<oozie-install-dir>/libext
+
+</verbatim>
+
+   * Package the Oozie WAR file as the Oozie user
+
+<verbatim>
+su - $OOZIE_USER
+cd <oozie-install-dir>/bin
+./oozie-setup.sh prepare-war
+
+Where $OOZIE_USER is the Oozie user. For example, oozie.
+</verbatim>
+
+   * Start the Oozie service on all Falcon clusters. Run these commands on the 
Oozie host machine.
+
+<verbatim>
+su - $OOZIE_USER
+<oozie-install-dir>/bin/oozie-start.sh
+
+Where $OOZIE_USER is the Oozie user. For example, oozie.
+</verbatim>
+
+---+++Disabling Falcon Post Processing
+Falcon post processing performs two tasks:
+They send user notifications to Active mq.
+It moves oozie executor logs once the workflow finishes.
+
+If post processing is failing because of any reason user mind end up having a 
backlog in the pipeline thats why it has been made optional.
+
+To disable post processing set the following property to false in 
startup.properties :
+<verbatim>
+*.falcon.postprocessing.enable=false
+*.workflow.execution.listeners=org.apache.falcon.service.LogMoverService
+</verbatim>
+*NOTE : Please make sure Oozie JMS Notifications are enabled as 
logMoverService depends on the Oozie JMS Notification.*
+
+
+---+++Enabling Falcon Native Scheudler
+You can either choose to schedule entities using Oozie's coordinator or using 
Falcon's native scheduler. To be able to
+schedule entities natively on Falcon, you will need to add some additional 
properties
+to <verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting 
the Falcon Server.
+For details on the same, refer to [[FalconNativeScheduler][Falcon Native 
Scheduler]]
+
+---+++Titan GraphDB backend
+GraphDB backend needs to be configured to properly start Falcon server.
+You can either choose to use 5.0.73 version of berkeleydb (the default for 
Falcon for the last few releases) or 1.1.x or later version HBase as the 
backend database.
+Falcon in its release distributions will have the titan storage plugins for 
both BerkeleyDB and HBase.
+
+----++++Using BerkeleyDB backend
+Falcon distributions may not package berkeley db artifacts (je-5.0.73.jar) 
based on build profiles.
+If Berkeley DB is not packaged, you can download the Berkeley DB jar file from 
the URL:
+<verbatim>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip</verbatim>.
+The following properties describe an example berkeley db graph storage backend 
that can be specified in the configuration file
+<verbatim>$FALCON_HOME/conf/startup.properties</verbatim>.
+
+<verbatim>
+# Graph Storage
+*.falcon.graph.storage.directory=${user.dir}/target/graphdb
+*.falcon.graph.storage.backend=berkeleyje
+*.falcon.graph.serialize.path=${user.dir}/target/graphdb
+</verbatim>
+
+---++++Using HBase backend
+
+To use HBase as the backend it is recommended that a HBase cluster be 
provisioned with distributed mode configuration primarily because of the 
support of kerberos enabled clusters and HA considerations.  Based on build 
profile, a standalone hbase version can be packaged with the Falcon binary 
distribution.   Along with this, a template for 
<verbatim>hbase-site.xml</verbatim> is provided, which can be used to start the 
standalone mode HBase enviornment for development/testing purposes.
+
+---++++ Basic configuration
+
+<verbatim>
+##### Falcon startup.properties
+*.falcon.graph.storage.backend=hbase
+#For standalone mode , specify localhost
+#for distributed mode, specify zookeeper quorum here - For more information 
refer 
http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
+*.falcon.graph.storage.hostname=<ZooKeeper Quorum>
+</verbatim>
+
+HBase configuration file (hbase-site.xml) and hbase libraries need to be added 
to classpath when Falcon starts up.   The following must be appended to the 
environment variable <verbatim>FALCON_EXTRA_CLASS_PATH<verbatim> in 
<verbatim>$FALCON_HOME/bin/falcon-env.sh</verbatim>.   Additionally the correct 
hbase client libraries need to be added.  For example,
+<verbatim>
+export FALCON_EXTRA_CLASS_PATH=`${HBASE_HOME}/bin/hbase classpath`
+</verbatim>
+
+Table name
+We recommend that in the startup config the tablename for titan storage be 
named <verbatim>falcon_titan<verbatim> so that multiple applications using 
Titan can share the same HBase cluster.   This can be set by specifying the 
tablename using the startup property given below. The default value is shown.
+
+<verbatim>
+*.falcon.graph.storage.hbase.table=falcon_titan
+</verbatim>
+
+---++++Starting standalone HBase for testing
+
+HBase can be started in stand alone mode for testing as a backend for Titan. 
The following steps outline the config changes required:
+<verbatim>
+1. Build Falcon as below to package hbase binaries
+   $ export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean 
assembly:assembly -Ppackage-standalone-hbase
+2. Configure HBase
+   a. When falcon tar file is expanded, HBase binaries are under 
${FALCON_HOME}/hbase
+   b. Copy ${FALCON_HOME}/conf/hbase-site.xml.template into hbase conf dir in 
${FALCON_HOME}/hbase/conf/hbase-site.xml
+   c. Set {hbase_home} property to point to a local dir
+   d. Standalone HBase starts zookeeper on the default port (2181). This port 
can be changed by adding the following to hbase-site.xml
+       <property>
+            <name>hbase.zookeeper.property.clientPort</name>
+            <value>2223</value>
+       </property>
+
+       <property>
+            <name>hbase.zookeeper.quorum</name>
+            <value>localhost</value>
+       </property>
+    e. set JAVA_HOME to point to Java 1.7 or above
+    f. Start hbase as ${FALCON_HOME}/hbase/bin/start-hbase.sh
+3. Configure Falcon
+   a. In ${FALCON_HOME}/conf/startup.properties, uncomment the following to 
enable HBase as the backend
+      *.falcon.graph.storage.backend=hbase
+      ### specify the zookeeper host and port name with which standalone hbase 
is started (see step 2)
+      ### by default, it will be localhost and port 2181
+      
*.falcon.graph.storage.hostname=<zookeeper-host-name>:<zookeeper-host-port>
+      *.falcon.graph.serialize.path=${user.dir}/target/graphdb
+      *.falcon.graph.storage.hbase.table=falcon_titan
+      *.falcon.graph.storage.transactions=false
+4. Add HBase jars to Falcon classpath in ${FALCON_HOME}/conf/falcon-env.sh as:
+      FALCON_EXTRA_CLASS_PATH=`${FALCON_HOME}/hbase/bin/hbase classpath`
+5. Set the following in ${FALCON_HOME}/conf/startup.properties to disable SSL 
if needed
+      *.falcon.enableTLS=false
+6. Start Falcon
+</verbatim>
+
+---++++Permissions
+
+When Falcon is configured with HBase as the storage backend Titan needs to 
have sufficient authorizations to create and access an HBase table.  In a 
secure cluster it may be necessary to grant permissions to the 
<verbatim>falcon</verbatim> user for the <verbatim>falcon_titan</verbatim> 
table (or whateven tablename was specified for the property 
<verbatim>*.falcon.graph.storage.hbase.table</verbatim>
+
+With Ranger, a policy can be configured for <verbatim>falcon_titan</verbatim>.
+
+Without Ranger, HBase shell can be used to set the permissions.
+
+<verbatim>
+   su hbase
+   kinit -k -t <hbase keytab> <hbase principal>
+   echo "grant 'falcon', 'RWXCA', 'falcon_titan'" | hbase shell
+</verbatim>
+
+---++++Advanced configuration
+
+HBase storage backend support in Titan has a few other configurations and they 
can be set in <verbatim>$FALCON_HOME/conf/startup.properties</verbatim>, by 
prefixing the Titan property with <verbatim>*.falcon.graph</verbatim> prefix.
+
+Please Refer to 
<verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage</verbatim>
 for generic storage properties, 
<verbaim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_berkeleydb</verbatim>
 for berkeley db properties and 
<verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase</verbatim>
 for hbase storage backend properties.
+
+
+
+---+++Adding Extension Libraries
+
+Library extensions allows users to add custom libraries to entity lifecycles 
such as feed retention, feed replication
+and process execution. This is useful for usecases such as adding filesystem 
extensions. To enable this, add the
+following configs to startup.properties:
+*.libext.paths=<paths to be added to all entity lifecycles>
+
+*.libext.feed.paths=<paths to be added to all feed lifecycles>
+
+*.libext.feed.retentions.paths=<paths to be added to feed retention workflow>
+
+*.libext.feed.replication.paths=<paths to be added to feed replication 
workflow>
+
+*.libext.process.paths=<paths to be added to process workflow>
+
+The configured jars are added to falcon classpath and the corresponding 
workflows.


http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki 
b/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki
new file mode 100644
index 0000000..24e543b
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki
@@ -0,0 +1,61 @@
+---+ Data Replication between On-premise Hadoop Clusters and Azure Cloud
+
+---++ Overview
+Falcon provides an easy way to replicate data between on-premise Hadoop 
clusters and Azure cloud.
+With this feature, users would be able to build a hybrid data pipeline,
+e.g. processing sensitive data on-premises for privacy and compliance reasons
+while leverage cloud for elastic scale and online services (e.g. Azure machine 
learning) with non-sensitive data.
+
+---++ Use Case
+1. Copy data from on-premise Hadoop clusters to Azure cloud
+2. Copy data from Azure cloud to on-premise Hadoop clusters
+3. Copy data within Azure cloud (i.e. from one Azure location to another).
+
+---++ Usage
+---+++ Set Up Azure Blob Credentials
+To move data to/from Azure blobs, we need to add Azure blob credentials in 
HDFS.
+This can be done by adding the credential property through Ambari HDFS 
configs, and HDFS needs to be restarted after adding the credential.
+You can also add the credential property to core-site.xml directly, but make 
sure you restart HDFS from command line instead of Ambari.
+Otherwise, Ambari will take the previous HDFS configuration without your Azure 
blob credentials.
+<verbatim>
+<property>
+      
<name>fs.azure.account.key.{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net</name>
+      <value>{AZURE_BLOB_ACCOUNT_KEY}</value>
+</property>
+</verbatim>
+
+To verify you set up Azure credential properly, you can check if you are able 
to access Azure blob through HDFS, e.g.
+<verbatim>
+hadoop fs Âls 
wasb://{AZURE_BLOB_CONTAINER}@{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net/
+</verbatim>
+
+---+++ Replication Feed
+[[EntitySpecification][Falcon replication feed]] can be used for data 
replication to/from Azure cloud.
+You can specify WASB (i.e. Windows Azure Storage Blob) url in source or target 
locations.
+See below for an example of data replication from Hadoop cluster to Azure blob.
+Note that the clusters for the source and the target need to be different.
+Analogously, if you want to copy data from Azure blob, you can add Azure blob 
location to the source.
+<verbatim>
+<?xml version="1.0" encoding="UTF-8"?>
+<feed name="AzureReplication" xmlns="uri:falcon:feed:0.1">
+    <frequency>months(1)</frequency>
+    <clusters>
+        <cluster name="SampleCluster1" type="source">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+        </cluster>
+        <cluster name="SampleCluster2" type="target">
+            <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/>
+            <retention limit="days(90)" action="delete"/>
+            <locations>
+                <location type="data" 
path="wasb://replication-t...@mystorage.blob.core.windows.net/replicated-${YEAR}-${MONTH}"/>
+            </locations>
+        </cluster>
+    </clusters>
+    <locations>
+        <location type="data" path="/apps/falcon/demo/data-${YEAR}-${MONTH}" />
+    </locations>
+    <ACL owner="ambari-qa" group="users" permission="0755"/>
+    <schema location="hcat" provider="hcat"/>
+</feed>
+</verbatim>

http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki 
b/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki
new file mode 100644
index 0000000..34fb092
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki
@@ -0,0 +1,198 @@
+---+Distributed Mode
+
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. 
You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Letâs 
call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true 
-Pdistributed,hadoop-2
+</verbatim>
+
+
+<verbatim>
+$ls {project dir}/distro/target/
+</verbatim>
+
+It should give an output like below :
+<verbatim>
+apache-falcon-distributed-${project.version}-server.tar.gz
+apache-falcon-distributed-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+   * apache-falcon-distributed-${project.version}-sources.tar.gz contains 
source files of Falcon repo.
+
+   * apache-falcon-distributed-${project.version}-server.tar.gz package 
contains project artifacts along with it's
+dependencies, configuration files and scripts required to deploy Falcon.
+
+
+Tar can be found in {project 
dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz . This 
is the tar
+used for installing Falcon. Lets call it {falcon package}
+
+Tar is structured as follows.
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+   |- prism-stop
+   |- prism-start
+   |- prism-status
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- client.properties
+   |- prism.keystore
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+      |- prism.war
+|- oozie
+   |- conf
+   |- libext
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+---+++Installing Falcon
+
+Running Falcon in distributed mode requires bringing up both prism and 
server.As the name suggests Falcon prism splits
+the request it gets to the Falcon servers. It is a good practice to start 
prism and server with their corresponding
+configurations separately. Create separate directory for prism and server. 
Let's call them {falcon-prism-dir} and
+{falcon-server-dir} respectively.
+
+*For prism*
+<verbatim>
+$mkdir {falcon-prism-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+*For server*
+<verbatim>
+$mkdir {falcon-server-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+
+---+++Starting Prism
+
+<verbatim>
+cd {falcon-prism-dir}/falcon-distributed-${project.version}
+bin/prism-start [-port <port>]
+</verbatim>
+
+By default,
+* prism server starts at port 16443. To change the port, use -port option
+
+* falcon.enableTLS can be set to true or false explicitly to enable SSL, if 
not port that end with 443 will
+automatically put prism on https://
+
+* prism starts with conf from 
{falcon-prism-dir}/falcon-distributed-${project.version}/conf. To override this 
(to use
+the same conf with multiple prism upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling prism-client*
+*If prism is not started using default-port 16443 then edit the following 
property in
+{falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties
+falcon.url=http://{machine-ip}:{prism-port}/
+
+
+---+++Starting Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon 
starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 
on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will 
automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* server starts with conf from 
{falcon-server-dir}/falcon-distributed-${project.version}/conf. To override 
this (to use
+the same conf with multiple server upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling server-client*
+*If server is not started using default-port 15443 then edit the following 
property in
+{falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties.
 You can find the instructions for
+configuring Falcon here.
+falcon.url=http://{machine-ip}:{server-port}/
+
+*NOTE* : https is the secure version of HTTP, the protocol over which data is 
sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can 
configure it to http.
+
+
+---+++Using Falcon
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/falcon admin -version
+Falcon server build version: 
{Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",
+Mode:"embedded"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+
+---+++Dashboard
+
+Once Falcon / prism is started, you can view the status of Falcon entities 
using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this 
user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---+++Stopping Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-stop
+</verbatim>
+
+---+++Stopping Falcon Prism
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/prism-stop
+</verbatim>

http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki 
b/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki
new file mode 100644
index 0000000..47acab4
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki
@@ -0,0 +1,199 @@
+---+Embedded Mode
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. 
You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Letâs 
call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true
+</verbatim>
+
+<verbatim>
+$ls {project dir}/distro/target/
+</verbatim>
+It should give an output like below :
+<verbatim>
+apache-falcon-${project.version}-bin.tar.gz
+apache-falcon-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+* apache-falcon-${project.version}-sources.tar.gz contains source files of 
Falcon repo.
+
+* apache-falcon-${project.version}-bin.tar.gz package contains project 
artifacts along with it's dependencies,
+configuration files and scripts required to deploy Falcon.
+
+Tar can be found in {project 
dir}/target/apache-falcon-${project.version}-bin.tar.gz
+
+Tar is structured as follows :
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- prism.keystore
+   |- client.properties
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+|- data
+   |- falcon-store
+   |- graphdb
+   |- localhost
+|- examples
+   |- app
+      |- hive
+      |- oozie-mr
+      |- pig
+   |- data
+   |- entity
+      |- filesystem
+      |- hcat
+|- oozie
+   |- conf
+   |- libext
+|- logs
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+Running Falcon in embedded mode requires bringing up server.
+
+<verbatim>
+$tar -xzvf {falcon package}
+$cd falcon-${project.version}
+</verbatim>
+
+
+---+++Starting Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon 
starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 
on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will 
automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* Server starts with conf from 
{falcon-server-dir}/falcon-distributed-${project.version}/conf. To override 
this (to use
+the same conf with multiple server upgrades), set environment variable 
FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+
+---+++Enabling server-client
+If server is not started using default-port 15443 then edit the following 
property in
+{falcon-server-dir}/falcon-${project.version}/conf/client.properties
+
+falcon.url=http://{machine-ip}:{server-port}/
+
+
+---+++Using Falcon
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon admin -version
+Falcon server build version: 
{Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:
+"embedded",Hadoop:"${hadoop.version}"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+*Note* : https is the secure version of HTTP, the protocol over which data is 
sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can 
configure it to http.
+
+
+---+++Dashboard
+
+Once Falcon server is started, you can view the status of Falcon entities 
using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this 
user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---++Running Examples using embedded package
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start
+</verbatim>
+Make sure the Hadoop and Oozie endpoints are according to your setup in
+examples/entity/filesystem/standalone-cluster.xml
+The cluster locations,staging and working dirs, MUST be created prior to 
submitting a cluster entity to Falcon.
+*staging* must have 777 permissions and the parent dirs must have execute 
permissions
+*working* must have 755 permissions and the parent dirs must have execute 
permissions
+<verbatim>
+$bin/falcon entity -submit -type cluster -file 
examples/entity/filesystem/standalone-cluster.xml
+</verbatim>
+Submit input and output feeds:
+<verbatim>
+$bin/falcon entity -submit -type feed -file 
examples/entity/filesystem/in-feed.xml
+$bin/falcon entity -submit -type feed -file 
examples/entity/filesystem/out-feed.xml
+</verbatim>
+Set-up workflow for the process:
+<verbatim>
+$hadoop fs -put examples/app /
+</verbatim>
+Submit and schedule the process:
+<verbatim>
+$bin/falcon entity -submitAndSchedule -type process -file 
examples/entity/filesystem/oozie-mr-process.xml
+$bin/falcon entity -submitAndSchedule -type process -file 
examples/entity/filesystem/pig-process.xml
+$bin/falcon entity -submitAndSchedule -type process -file 
examples/entity/spark/spark-process.xml
+</verbatim>
+Generate input data:
+<verbatim>
+$examples/data/generate.sh <<hdfs endpoint>>
+</verbatim>
+Get status of instances:
+<verbatim>
+$bin/falcon instance -status -type process -name oozie-mr-process -start 
2013-11-15T00:05Z -end 2013-11-15T01:00Z
+</verbatim>
+
+HCat based example entities are in examples/entity/hcat.
+Spark based example entities are in examples/entity/spark.
+
+---+++Stopping Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-stop
+</verbatim>

http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki 
b/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki
new file mode 100644
index 0000000..8534ba6
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki
@@ -0,0 +1,57 @@
+---++Entity SLA Alerting
+
+Falcon supports SLA in feed and process.
+
+Types of SLA supported for feed:
+
+   1.slaLow
+   1.slaHigh
+
+To know more about feedSla look at [[EntitySpecification][Feed Specification]]
+
+Types of SLA supported for process:
+
+   1.shouldStartIn
+   1.shouldEndIn
+
+To know more about processSla look at [[EntitySpecification][Process 
Specification]]
+
+Falcon Entity Alerting service do the following things:
+
+   1.Monitor instances of feed and process and send notifications to all the 
listeners attached to it.
+   1.In case of feed it notifies when an *slaHigh* miss happens. slaLow is not 
supported.
+   1.In case of process it notifies when an SLA miss for *shouldEndIn* 
happens. shouldStartIn is not supported.
+
+Entity SLA Alert service depends upon [[EntitySLAMonitoring][Falcon Entity SLA 
Monitoring]] to know which process and feed instances are to be monitored.
+
+*How to attach listeners:*
+
+You can write custom listeners to do some action whenever a process or feed 
instance misses its SLA.
+To attach listeners please add below property in startup.properties:
+
+<verbatim>
+
+*.entityAlert.listeners=org.apache.customPath.customListener
+
+</verbatim>
+
+Currently Falcon natively supports [[BacklogMetricEmitterService][Back Log 
Emitter Service]] as a listener to EntitySLAAlert service
+
+---++Dependencies:
+
+*Other Services:*
+
+To enable Enity SLA Alerting service you need to enable 
[[EntitySLAMonitoring][Falcon Entity SLA Monitoring]]
+
+ Following properties are needed in startup.properties:
+
+<verbatim>
+
+*.application.services=org.apache.falcon.service.EntitySLAAlertService
+
+*.entity.sla.statusCheck.frequency.seconds=600
+</verbatim>
+
+*Falcon Database:*
+
+Entity SLA Alerting service maintains its state in the database.It needs one 
table *ENTITY_SLA_ALERTS* please have a look at [[FalconDatabase]] to know how 
to create it.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki
----------------------------------------------------------------------
diff --git a/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki 
b/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki
new file mode 100644
index 0000000..bdd9ac4
--- /dev/null
+++ b/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki
@@ -0,0 +1,25 @@
+---++Falcon Entity SLA Monitoring
+
+Entity SLA monitoring allows you to monitor the entity (process and feed) .It 
keeps track of the instances of the entity that are running and stores them in 
the db.
+
+
+---++Dependencies:
+
+*Other Services:*
+
+Entity SLA monitoring service requires FalconJPAService to be up. Following 
are the values you need to set to run EntitySLAMonitoring.
+In startup.properties:
+
+<verbatim>
+*.application.services= org.apache.falcon.state.store.service.FalconJPAService,
+                        org.apache.falcon.service.EntitySLAMonitoringService
+</verbatim>
+
+
+*Falcon Database:*
+
+Entity SLA monitoring service maintains its state in the database.It needs two 
tables:
+
+   1.MONITORED_ENTITY
+   1.PENDING_INSTANCES
+please have a look at [[FalconDatabase]] to know how to create them.

[07/38] falcon git commit: Falcon 0.11 release and site update

Reply via email to