http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Configuration.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/Configuration.twiki b/trunk/releases/0.11/src/site/twiki/Configuration.twiki new file mode 100644 index 0000000..c686d48 --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/Configuration.twiki @@ -0,0 +1,461 @@ +---+Configuring Falcon + +By default config directory used by falcon is {package dir}/conf. To override this (to use the same conf with multiple +falcon upgrades), set environment variable FALCON_CONF to the path of the conf dir. + +falcon-env.sh has been added to the falcon conf. This file can be used to set various environment variables that you +need for you services. +In addition you can set any other environment variables you might need. This file will be sourced by falcon scripts +before any commands are executed. The following environment variables are available to set. + +<verbatim> +# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path +#export JAVA_HOME= + +# any additional java opts you want to set. This will apply to both client and server operations +#export FALCON_OPTS= + +# any additional java opts that you want to set for client only +#export FALCON_CLIENT_OPTS= + +# java heap size we want to set for the client. Default is 1024MB +#export FALCON_CLIENT_HEAP= + +# any additional opts you want to set for prism service. +#export FALCON_PRISM_OPTS= + +# java heap size we want to set for the prism service. Default is 1024MB +#export FALCON_PRISM_HEAP= + +# any additional opts you want to set for falcon service. +#export FALCON_SERVER_OPTS= + +# java heap size we want to set for the falcon server. Default is 1024MB +#export FALCON_SERVER_HEAP= + +# What is is considered as falcon home dir. Default is the base location of the installed software +#export FALCON_HOME_DIR= + +# Where log files are stored. Default is logs directory under the base install location +#export FALCON_LOG_DIR= + +# Where pid files are stored. Default is logs directory under the base install location +#export FALCON_PID_DIR= + +# where the falcon active mq data is stored. Default is logs/data directory under the base install location +#export FALCON_DATA_DIR= + +# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir. +#export FALCON_EXPANDED_WEBAPP_DIR= + +# Any additional classpath elements to be added to the Falcon server/client classpath +#export FALCON_EXTRA_CLASS_PATH= +</verbatim> + +---++Advanced Configurations + +---+++Configuring Monitoring plugin to register catalog partitions +Falcon comes with a monitoring plugin that registers catalog partition. This comes in really handy during migration from + filesystem based feeds to hcatalog based feeds. +This plugin enables the user to de-couple the partition registration and assume that all partitions are already on +hcatalog even before the migration, simplifying the hcatalog migration. + +By default this plugin is disabled. +To enable this plugin and leverage the feature, there are 3 pre-requisites: +<verbatim> +In {package dir}/conf/startup.properties, add +*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler + +In the cluster definition, ensure registry endpoint is defined. +Ex: +<interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/> + +In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties +Ex: +<properties> + <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR}; + minute={MINUTE}"/> +</properties> +</verbatim> + +*NOTE : for Mac OS users* +<verbatim> +If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above). + +In {package dir}/conf/falcon-env.sh uncomment the following line +#export FALCON_SERVER_OPTS= + +and change it to look as below +export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=" +</verbatim> + +---+++Activemq +* falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D +option in environment variable FALCON_OPTS: + * falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true + * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616 + * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package dir}/logs/data + +---+++Falcon System Notifications + +Some Falcon features such as late data handling, retries, metadata service, depend on JMS notifications sent when the +Oozie workflow completes. Falcon listens to Oozie notification via JMS. You need to enable Oozie JMS notification as +explained below. Falcon post processing feature continues to only send user notifications so enabling Oozie +JMS notification is important. + +*NOTE : If Oozie JMS notification is not enabled, the Falcon features such as failure retry, late data handling and metadata +service will be disabled for all entities on the server.* + +---+++Enable Oozie JMS notification + + * Please add/change the following properties in oozie-site.xml in the oozie installation dir. + +<verbatim> + <property> + <name>oozie.jms.producer.connection.properties</name> + <value>java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://<activemq-host>:<port></value> + </property> + + <property> + <name>oozie.service.EventHandlerService.event.listeners</name> + <value>org.apache.oozie.jms.JMSJobEventListener</value> + </property> + + <property> + <name>oozie.service.JMSTopicService.topic.name</name> + <value>WORKFLOW=ENTITY.TOPIC,COORDINATOR=ENTITY.TOPIC</value> + </property> + + <property> + <name>oozie.service.JMSTopicService.topic.prefix</name> + <value>FALCON.</value> + </property> + + <!-- add org.apache.oozie.service.JMSAccessorService to the other existing services if any --> + <property> + <name>oozie.services.ext</name> + <value>org.apache.oozie.service.JMSAccessorService,org.apache.oozie.service.PartitionDependencyManagerService,org.apache.oozie.service.HCatAccessorService</value> + </property> +</verbatim> + + * In falcon startup.properties, set JMS broker url to be the same as the one set in oozie-site.xml property + oozie.jms.producer.connection.properties (see above) + +<verbatim> + *.broker.url=tcp://<activemq-host>:<port> +</verbatim> + +---+++Configuring Oozie for Falcon + +Falcon uses HCatalog for data availability notification when Hive tables are replicated. Make the following configuration +changes to Oozie to ensure Hive table replication in Falcon: + + * Stop the Oozie service on all Falcon clusters. Run the following commands on the Oozie host machine. + +<verbatim> +su - $OOZIE_USER + +<oozie-install-dir>/bin/oozie-stop.sh + +where $OOZIE_USER is the Oozie user. For example, oozie. +</verbatim> + + * Copy each cluster's hadoop conf directory to a different location. For example, if you have two clusters, copy one to /etc/hadoop/conf-1 and the other to /etc/hadoop/conf-2. + + * For each oozie-site.xml file, modify the oozie.service.HadoopAccessorService.hadoop.configurations property, specifying clusters, the RPC ports of the NameNodes, and HostManagers accordingly. For example, if Falcon connects to three clusters, specify: + +<verbatim> + +<property> + <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> + <value>*=/etc/hadoop/conf,$NameNode:$rpcPortNN=$hadoopConfDir1,$ResourceManager1:$rpcPortRM=$hadoopConfDir1,$NameNode2=$hadoopConfDir2,$ResourceManager2:$rpcPortRM=$hadoopConfDir2,$NameNode3 :$rpcPortNN =$hadoopConfDir3,$ResourceManager3 :$rpcPortRM =$hadoopConfDir3</value> + <description> + Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of + the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is + used when there is no exact match for an authority. The HADOOP_CONF_DIR contains + the relevant Hadoop *-site.xml files. If the path is relative is looked within + the Oozie configuration directory; though the path can be absolute (i.e. to point + to Hadoop client conf/ directories in the local filesystem. + </description> +</property> + +</verbatim> + + * Add the following properties to the /etc/oozie/conf/oozie-site.xml file: + +<verbatim> + +<property> + <name>oozie.service.ProxyUserService.proxyuser.falcon.hosts</name> + <value>*</value> +</property> + +<property> + <name>oozie.service.ProxyUserService.proxyuser.falcon.groups</name> + <value>*</value> +</property> + +<property> + <name>oozie.service.URIHandlerService.uri.handlers</name> + <value>org.apache.oozie.dependency.FSURIHandler, org.apache.oozie.dependency.HCatURIHandler</value> +</property> + +<property> + <name>oozie.services.ext</name> + <value>org.apache.oozie.service.JMSAccessorService, org.apache.oozie.service.PartitionDependencyManagerService, + org.apache.oozie.service.HCatAccessorService</value> +</property> + +<!-- Coord EL Functions Properties --> + +<property> + <name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name> + <value>now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo, + today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo, + yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo, + currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo, + lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo, + currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo, + lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo, + formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo, + latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, + future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo + </value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-action-create-inst</name> + <value>now=org.apache.oozie.extensions.OozieELExtensions#ph2_now_inst, + today=org.apache.oozie.extensions.OozieELExtensions#ph2_today_inst, + yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday_inst, + currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth_inst, + lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth_inst, + currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear_inst, + lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear_inst, + latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo, + future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo, + formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + +<property> +<name>oozie.service.ELService.ext.functions.coord-action-start</name> +<value> +now=org.apache.oozie.extensions.OozieELExtensions#ph2_now, +today=org.apache.oozie.extensions.OozieELExtensions#ph2_today, +yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday, +currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth, +lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth, +currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear, +lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear, +latest=org.apache.oozie.coord.CoordELFunctions#ph3_coord_latest, +future=org.apache.oozie.coord.CoordELFunctions#ph3_coord_future, +dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn, +instanceTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime, +dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3_coord_dateOffset, +formatTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_formatTime, +user=org.apache.oozie.coord.CoordELFunctions#coord_user +</value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-sla-submit</name> + <value> + instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_fixed, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + +<property> + <name>oozie.service.ELService.ext.functions.coord-sla-create</name> + <value> + instanceTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_nominalTime, + user=org.apache.oozie.coord.CoordELFunctions#coord_user + </value> +</property> + +</verbatim> + + * Copy the existing Oozie WAR file to <oozie-install-dir>/oozie.war. This will ensure that all existing items in the WAR file are still present after the current update. + +<verbatim> +su - root +cp $CATALINA_BASE/webapps/oozie.war <oozie-install-dir>/oozie.war + +where $CATALINA_BASE is the path for the Oozie web app. By default, $CATALINA_BASE is: <oozie-install-dir> +</verbatim> + + * Add the Falcon EL extensions to Oozie. + +Copy the extension JAR files provided with the Falcon Server to a temporary directory on the Oozie server. For example, if your standalone Falcon Server is on the same machine as your Oozie server, you can just copy the JAR files. + +<verbatim> + +mkdir /tmp/falcon-oozie-jars +cp <falcon-install-dir>/oozie/ext/falcon-oozie-el-extension-<$version>.jar /tmp/falcon-oozie-jars +cp /tmp/falcon-oozie-jars/falcon-oozie-el-extension-<$version>.jar <oozie-install-dir>/libext + +</verbatim> + + * Package the Oozie WAR file as the Oozie user + +<verbatim> +su - $OOZIE_USER +cd <oozie-install-dir>/bin +./oozie-setup.sh prepare-war + +Where $OOZIE_USER is the Oozie user. For example, oozie. +</verbatim> + + * Start the Oozie service on all Falcon clusters. Run these commands on the Oozie host machine. + +<verbatim> +su - $OOZIE_USER +<oozie-install-dir>/bin/oozie-start.sh + +Where $OOZIE_USER is the Oozie user. For example, oozie. +</verbatim> + +---+++Disabling Falcon Post Processing +Falcon post processing performs two tasks: +They send user notifications to Active mq. +It moves oozie executor logs once the workflow finishes. + +If post processing is failing because of any reason user mind end up having a backlog in the pipeline thats why it has been made optional. + +To disable post processing set the following property to false in startup.properties : +<verbatim> +*.falcon.postprocessing.enable=false +*.workflow.execution.listeners=org.apache.falcon.service.LogMoverService +</verbatim> +*NOTE : Please make sure Oozie JMS Notifications are enabled as logMoverService depends on the Oozie JMS Notification.* + + +---+++Enabling Falcon Native Scheudler +You can either choose to schedule entities using Oozie's coordinator or using Falcon's native scheduler. To be able to +schedule entities natively on Falcon, you will need to add some additional properties +to <verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting the Falcon Server. +For details on the same, refer to [[FalconNativeScheduler][Falcon Native Scheduler]] + +---+++Titan GraphDB backend +GraphDB backend needs to be configured to properly start Falcon server. +You can either choose to use 5.0.73 version of berkeleydb (the default for Falcon for the last few releases) or 1.1.x or later version HBase as the backend database. +Falcon in its release distributions will have the titan storage plugins for both BerkeleyDB and HBase. + +----++++Using BerkeleyDB backend +Falcon distributions may not package berkeley db artifacts (je-5.0.73.jar) based on build profiles. +If Berkeley DB is not packaged, you can download the Berkeley DB jar file from the URL: +<verbatim>http://download.oracle.com/otn/berkeley-db/je-5.0.73.zip</verbatim>. +The following properties describe an example berkeley db graph storage backend that can be specified in the configuration file +<verbatim>$FALCON_HOME/conf/startup.properties</verbatim>. + +<verbatim> +# Graph Storage +*.falcon.graph.storage.directory=${user.dir}/target/graphdb +*.falcon.graph.storage.backend=berkeleyje +*.falcon.graph.serialize.path=${user.dir}/target/graphdb +</verbatim> + +---++++Using HBase backend + +To use HBase as the backend it is recommended that a HBase cluster be provisioned with distributed mode configuration primarily because of the support of kerberos enabled clusters and HA considerations. Based on build profile, a standalone hbase version can be packaged with the Falcon binary distribution. Along with this, a template for <verbatim>hbase-site.xml</verbatim> is provided, which can be used to start the standalone mode HBase enviornment for development/testing purposes. + +---++++ Basic configuration + +<verbatim> +##### Falcon startup.properties +*.falcon.graph.storage.backend=hbase +#For standalone mode , specify localhost +#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2 +*.falcon.graph.storage.hostname=<ZooKeeper Quorum> +</verbatim> + +HBase configuration file (hbase-site.xml) and hbase libraries need to be added to classpath when Falcon starts up. The following must be appended to the environment variable <verbatim>FALCON_EXTRA_CLASS_PATH<verbatim> in <verbatim>$FALCON_HOME/bin/falcon-env.sh</verbatim>. Additionally the correct hbase client libraries need to be added. For example, +<verbatim> +export FALCON_EXTRA_CLASS_PATH=`${HBASE_HOME}/bin/hbase classpath` +</verbatim> + +Table name +We recommend that in the startup config the tablename for titan storage be named <verbatim>falcon_titan<verbatim> so that multiple applications using Titan can share the same HBase cluster. This can be set by specifying the tablename using the startup property given below. The default value is shown. + +<verbatim> +*.falcon.graph.storage.hbase.table=falcon_titan +</verbatim> + +---++++Starting standalone HBase for testing + +HBase can be started in stand alone mode for testing as a backend for Titan. The following steps outline the config changes required: +<verbatim> +1. Build Falcon as below to package hbase binaries + $ export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean assembly:assembly -Ppackage-standalone-hbase +2. Configure HBase + a. When falcon tar file is expanded, HBase binaries are under ${FALCON_HOME}/hbase + b. Copy ${FALCON_HOME}/conf/hbase-site.xml.template into hbase conf dir in ${FALCON_HOME}/hbase/conf/hbase-site.xml + c. Set {hbase_home} property to point to a local dir + d. Standalone HBase starts zookeeper on the default port (2181). This port can be changed by adding the following to hbase-site.xml + <property> + <name>hbase.zookeeper.property.clientPort</name> + <value>2223</value> + </property> + + <property> + <name>hbase.zookeeper.quorum</name> + <value>localhost</value> + </property> + e. set JAVA_HOME to point to Java 1.7 or above + f. Start hbase as ${FALCON_HOME}/hbase/bin/start-hbase.sh +3. Configure Falcon + a. In ${FALCON_HOME}/conf/startup.properties, uncomment the following to enable HBase as the backend + *.falcon.graph.storage.backend=hbase + ### specify the zookeeper host and port name with which standalone hbase is started (see step 2) + ### by default, it will be localhost and port 2181 + *.falcon.graph.storage.hostname=<zookeeper-host-name>:<zookeeper-host-port> + *.falcon.graph.serialize.path=${user.dir}/target/graphdb + *.falcon.graph.storage.hbase.table=falcon_titan + *.falcon.graph.storage.transactions=false +4. Add HBase jars to Falcon classpath in ${FALCON_HOME}/conf/falcon-env.sh as: + FALCON_EXTRA_CLASS_PATH=`${FALCON_HOME}/hbase/bin/hbase classpath` +5. Set the following in ${FALCON_HOME}/conf/startup.properties to disable SSL if needed + *.falcon.enableTLS=false +6. Start Falcon +</verbatim> + +---++++Permissions + +When Falcon is configured with HBase as the storage backend Titan needs to have sufficient authorizations to create and access an HBase table. In a secure cluster it may be necessary to grant permissions to the <verbatim>falcon</verbatim> user for the <verbatim>falcon_titan</verbatim> table (or whateven tablename was specified for the property <verbatim>*.falcon.graph.storage.hbase.table</verbatim> + +With Ranger, a policy can be configured for <verbatim>falcon_titan</verbatim>. + +Without Ranger, HBase shell can be used to set the permissions. + +<verbatim> + su hbase + kinit -k -t <hbase keytab> <hbase principal> + echo "grant 'falcon', 'RWXCA', 'falcon_titan'" | hbase shell +</verbatim> + +---++++Advanced configuration + +HBase storage backend support in Titan has a few other configurations and they can be set in <verbatim>$FALCON_HOME/conf/startup.properties</verbatim>, by prefixing the Titan property with <verbatim>*.falcon.graph</verbatim> prefix. + +Please Refer to <verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage</verbatim> for generic storage properties, <verbaim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_berkeleydb</verbatim> for berkeley db properties and <verbatim>http://s3.thinkaurelius.com/docs/titan/0.5.4/titan-config-ref.html#_storage_hbase</verbatim> for hbase storage backend properties. + + + +---+++Adding Extension Libraries + +Library extensions allows users to add custom libraries to entity lifecycles such as feed retention, feed replication +and process execution. This is useful for usecases such as adding filesystem extensions. To enable this, add the +following configs to startup.properties: +*.libext.paths=<paths to be added to all entity lifecycles> + +*.libext.feed.paths=<paths to be added to all feed lifecycles> + +*.libext.feed.retentions.paths=<paths to be added to feed retention workflow> + +*.libext.feed.replication.paths=<paths to be added to feed replication workflow> + +*.libext.process.paths=<paths to be added to process workflow> + +The configured jars are added to falcon classpath and the corresponding workflows.
http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki b/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki new file mode 100644 index 0000000..24e543b --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/DataReplicationAzure.twiki @@ -0,0 +1,61 @@ +---+ Data Replication between On-premise Hadoop Clusters and Azure Cloud + +---++ Overview +Falcon provides an easy way to replicate data between on-premise Hadoop clusters and Azure cloud. +With this feature, users would be able to build a hybrid data pipeline, +e.g. processing sensitive data on-premises for privacy and compliance reasons +while leverage cloud for elastic scale and online services (e.g. Azure machine learning) with non-sensitive data. + +---++ Use Case +1. Copy data from on-premise Hadoop clusters to Azure cloud +2. Copy data from Azure cloud to on-premise Hadoop clusters +3. Copy data within Azure cloud (i.e. from one Azure location to another). + +---++ Usage +---+++ Set Up Azure Blob Credentials +To move data to/from Azure blobs, we need to add Azure blob credentials in HDFS. +This can be done by adding the credential property through Ambari HDFS configs, and HDFS needs to be restarted after adding the credential. +You can also add the credential property to core-site.xml directly, but make sure you restart HDFS from command line instead of Ambari. +Otherwise, Ambari will take the previous HDFS configuration without your Azure blob credentials. +<verbatim> +<property> + <name>fs.azure.account.key.{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net</name> + <value>{AZURE_BLOB_ACCOUNT_KEY}</value> +</property> +</verbatim> + +To verify you set up Azure credential properly, you can check if you are able to access Azure blob through HDFS, e.g. +<verbatim> +hadoop fs Âls wasb://{AZURE_BLOB_CONTAINER}@{AZURE_BLOB_ACCOUNT_NAME}.blob.core.windows.net/ +</verbatim> + +---+++ Replication Feed +[[EntitySpecification][Falcon replication feed]] can be used for data replication to/from Azure cloud. +You can specify WASB (i.e. Windows Azure Storage Blob) url in source or target locations. +See below for an example of data replication from Hadoop cluster to Azure blob. +Note that the clusters for the source and the target need to be different. +Analogously, if you want to copy data from Azure blob, you can add Azure blob location to the source. +<verbatim> +<?xml version="1.0" encoding="UTF-8"?> +<feed name="AzureReplication" xmlns="uri:falcon:feed:0.1"> + <frequency>months(1)</frequency> + <clusters> + <cluster name="SampleCluster1" type="source"> + <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/> + <retention limit="days(90)" action="delete"/> + </cluster> + <cluster name="SampleCluster2" type="target"> + <validity start="2010-06-01T00:00Z" end="2010-06-02T00:00Z"/> + <retention limit="days(90)" action="delete"/> + <locations> + <location type="data" path="wasb://replication-t...@mystorage.blob.core.windows.net/replicated-${YEAR}-${MONTH}"/> + </locations> + </cluster> + </clusters> + <locations> + <location type="data" path="/apps/falcon/demo/data-${YEAR}-${MONTH}" /> + </locations> + <ACL owner="ambari-qa" group="users" permission="0755"/> + <schema location="hcat" provider="hcat"/> +</feed> +</verbatim> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki b/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki new file mode 100644 index 0000000..34fb092 --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/Distributed-mode.twiki @@ -0,0 +1,198 @@ +---+Distributed Mode + + +Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned + [[InstallationSteps][here]] before proceeding further. + +---++Package Falcon +Ensure that you are in the base directory (where you cloned Falcon). Letâs call it {project dir} + +<verbatim> +$mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2 +</verbatim> + + +<verbatim> +$ls {project dir}/distro/target/ +</verbatim> + +It should give an output like below : +<verbatim> +apache-falcon-distributed-${project.version}-server.tar.gz +apache-falcon-distributed-${project.version}-sources.tar.gz +archive-tmp +maven-shared-archive-resources +</verbatim> + + * apache-falcon-distributed-${project.version}-sources.tar.gz contains source files of Falcon repo. + + * apache-falcon-distributed-${project.version}-server.tar.gz package contains project artifacts along with it's +dependencies, configuration files and scripts required to deploy Falcon. + + +Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz . This is the tar +used for installing Falcon. Lets call it {falcon package} + +Tar is structured as follows. + +<verbatim> + +|- bin + |- falcon + |- falcon-start + |- falcon-stop + |- falcon-status + |- falcon-config.sh + |- service-start.sh + |- service-stop.sh + |- service-status.sh + |- prism-stop + |- prism-start + |- prism-status +|- conf + |- startup.properties + |- runtime.properties + |- client.properties + |- prism.keystore + |- log4j.xml + |- falcon-env.sh +|- docs +|- client + |- lib (client support libs) +|- server + |- webapp + |- falcon.war + |- prism.war +|- oozie + |- conf + |- libext +|- hadooplibs +|- README +|- NOTICE.txt +|- LICENSE.txt +|- DISCLAIMER.txt +|- CHANGES.txt +</verbatim> + + +---++Installing & running Falcon + +---+++Installing Falcon + +Running Falcon in distributed mode requires bringing up both prism and server.As the name suggests Falcon prism splits +the request it gets to the Falcon servers. It is a good practice to start prism and server with their corresponding +configurations separately. Create separate directory for prism and server. Let's call them {falcon-prism-dir} and +{falcon-server-dir} respectively. + +*For prism* +<verbatim> +$mkdir {falcon-prism-dir} +$tar -xzvf {falcon package} +</verbatim> + +*For server* +<verbatim> +$mkdir {falcon-server-dir} +$tar -xzvf {falcon package} +</verbatim> + + +---+++Starting Prism + +<verbatim> +cd {falcon-prism-dir}/falcon-distributed-${project.version} +bin/prism-start [-port <port>] +</verbatim> + +By default, +* prism server starts at port 16443. To change the port, use -port option + +* falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that end with 443 will +automatically put prism on https:// + +* prism starts with conf from {falcon-prism-dir}/falcon-distributed-${project.version}/conf. To override this (to use +the same conf with multiple prism upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find +the instructions for configuring Falcon [[Configuration][here]]. + +*Enabling prism-client* +*If prism is not started using default-port 16443 then edit the following property in +{falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties +falcon.url=http://{machine-ip}:{prism-port}/ + + +---+++Starting Falcon Server + +<verbatim> +$cd {falcon-server-dir}/falcon-distributed-${project.version} +$bin/falcon-start [-port <port>] +</verbatim> + +By default, +* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on https:// by default. + +* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://. + +* To change the port, use -port option. + +* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on https://. Any +other port will put Falcon on http://. + +* server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use +the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find + the instructions for configuring Falcon [[Configuration][here]]. + +*Enabling server-client* +*If server is not started using default-port 15443 then edit the following property in +{falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties. You can find the instructions for +configuring Falcon here. +falcon.url=http://{machine-ip}:{server-port}/ + +*NOTE* : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website +that you are connected to. By default Falcon runs in https mode. But user can configure it to http. + + +---+++Using Falcon + +<verbatim> +$cd {falcon-prism-dir}/falcon-distributed-${project.version} +$bin/falcon admin -version +Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7", +Mode:"embedded"} + +$bin/falcon help +(for more details about Falcon cli usage) +</verbatim> + + +---+++Dashboard + +Once Falcon / prism is started, you can view the status of Falcon entities using the Web-based dashboard. You can open +your browser at the corresponding port to use the web UI. + +Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and +Oozie servers, please create the user. + +<verbatim> +## create user. +[root@falconhost ~] useradd -U -m falcon-dashboard -G users + +## verify user is created with membership in correct groups. +[root@falconhost ~] groups falcon-dashboard +falcon-dashboard : falcon-dashboard users +[root@falconhost ~] +</verbatim> + + +---+++Stopping Falcon Server + +<verbatim> +$cd {falcon-server-dir}/falcon-distributed-${project.version} +$bin/falcon-stop +</verbatim> + +---+++Stopping Falcon Prism + +<verbatim> +$cd {falcon-prism-dir}/falcon-distributed-${project.version} +$bin/prism-stop +</verbatim> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki b/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki new file mode 100644 index 0000000..47acab4 --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/Embedded-mode.twiki @@ -0,0 +1,199 @@ +---+Embedded Mode + +Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned + [[InstallationSteps][here]] before proceeding further. + +---++Package Falcon +Ensure that you are in the base directory (where you cloned Falcon). Letâs call it {project dir} + +<verbatim> +$mvn clean assembly:assembly -DskipTests -DskipCheck=true +</verbatim> + +<verbatim> +$ls {project dir}/distro/target/ +</verbatim> +It should give an output like below : +<verbatim> +apache-falcon-${project.version}-bin.tar.gz +apache-falcon-${project.version}-sources.tar.gz +archive-tmp +maven-shared-archive-resources +</verbatim> + +* apache-falcon-${project.version}-sources.tar.gz contains source files of Falcon repo. + +* apache-falcon-${project.version}-bin.tar.gz package contains project artifacts along with it's dependencies, +configuration files and scripts required to deploy Falcon. + +Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz + +Tar is structured as follows : + +<verbatim> + +|- bin + |- falcon + |- falcon-start + |- falcon-stop + |- falcon-status + |- falcon-config.sh + |- service-start.sh + |- service-stop.sh + |- service-status.sh +|- conf + |- startup.properties + |- runtime.properties + |- prism.keystore + |- client.properties + |- log4j.xml + |- falcon-env.sh +|- docs +|- client + |- lib (client support libs) +|- server + |- webapp + |- falcon.war +|- data + |- falcon-store + |- graphdb + |- localhost +|- examples + |- app + |- hive + |- oozie-mr + |- pig + |- data + |- entity + |- filesystem + |- hcat +|- oozie + |- conf + |- libext +|- logs +|- hadooplibs +|- README +|- NOTICE.txt +|- LICENSE.txt +|- DISCLAIMER.txt +|- CHANGES.txt +</verbatim> + + +---++Installing & running Falcon + +Running Falcon in embedded mode requires bringing up server. + +<verbatim> +$tar -xzvf {falcon package} +$cd falcon-${project.version} +</verbatim> + + +---+++Starting Falcon Server +<verbatim> +$cd falcon-${project.version} +$bin/falcon-start [-port <port>] +</verbatim> + +By default, +* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on https:// by default. + +* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://. + +* To change the port, use -port option. + +* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on https://. Any +other port will put Falcon on http://. + +* Server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use +the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find + the instructions for configuring Falcon [[Configuration][here]]. + + +---+++Enabling server-client +If server is not started using default-port 15443 then edit the following property in +{falcon-server-dir}/falcon-${project.version}/conf/client.properties + +falcon.url=http://{machine-ip}:{server-port}/ + + +---+++Using Falcon +<verbatim> +$cd falcon-${project.version} +$bin/falcon admin -version +Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode: +"embedded",Hadoop:"${hadoop.version}"} + +$bin/falcon help +(for more details about Falcon cli usage) +</verbatim> + +*Note* : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website +that you are connected to. By default Falcon runs in https mode. But user can configure it to http. + + +---+++Dashboard + +Once Falcon server is started, you can view the status of Falcon entities using the Web-based dashboard. You can open +your browser at the corresponding port to use the web UI. + +Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and +Oozie servers, please create the user. + +<verbatim> +## create user. +[root@falconhost ~] useradd -U -m falcon-dashboard -G users + +## verify user is created with membership in correct groups. +[root@falconhost ~] groups falcon-dashboard +falcon-dashboard : falcon-dashboard users +[root@falconhost ~] +</verbatim> + + +---++Running Examples using embedded package +<verbatim> +$cd falcon-${project.version} +$bin/falcon-start +</verbatim> +Make sure the Hadoop and Oozie endpoints are according to your setup in +examples/entity/filesystem/standalone-cluster.xml +The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster entity to Falcon. +*staging* must have 777 permissions and the parent dirs must have execute permissions +*working* must have 755 permissions and the parent dirs must have execute permissions +<verbatim> +$bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml +</verbatim> +Submit input and output feeds: +<verbatim> +$bin/falcon entity -submit -type feed -file examples/entity/filesystem/in-feed.xml +$bin/falcon entity -submit -type feed -file examples/entity/filesystem/out-feed.xml +</verbatim> +Set-up workflow for the process: +<verbatim> +$hadoop fs -put examples/app / +</verbatim> +Submit and schedule the process: +<verbatim> +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/oozie-mr-process.xml +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/pig-process.xml +$bin/falcon entity -submitAndSchedule -type process -file examples/entity/spark/spark-process.xml +</verbatim> +Generate input data: +<verbatim> +$examples/data/generate.sh <<hdfs endpoint>> +</verbatim> +Get status of instances: +<verbatim> +$bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z -end 2013-11-15T01:00Z +</verbatim> + +HCat based example entities are in examples/entity/hcat. +Spark based example entities are in examples/entity/spark. + +---+++Stopping Falcon Server +<verbatim> +$cd falcon-${project.version} +$bin/falcon-stop +</verbatim> http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki b/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki new file mode 100644 index 0000000..8534ba6 --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/EntitySLAAlerting.twiki @@ -0,0 +1,57 @@ +---++Entity SLA Alerting + +Falcon supports SLA in feed and process. + +Types of SLA supported for feed: + + 1.slaLow + 1.slaHigh + +To know more about feedSla look at [[EntitySpecification][Feed Specification]] + +Types of SLA supported for process: + + 1.shouldStartIn + 1.shouldEndIn + +To know more about processSla look at [[EntitySpecification][Process Specification]] + +Falcon Entity Alerting service do the following things: + + 1.Monitor instances of feed and process and send notifications to all the listeners attached to it. + 1.In case of feed it notifies when an *slaHigh* miss happens. slaLow is not supported. + 1.In case of process it notifies when an SLA miss for *shouldEndIn* happens. shouldStartIn is not supported. + +Entity SLA Alert service depends upon [[EntitySLAMonitoring][Falcon Entity SLA Monitoring]] to know which process and feed instances are to be monitored. + +*How to attach listeners:* + +You can write custom listeners to do some action whenever a process or feed instance misses its SLA. +To attach listeners please add below property in startup.properties: + +<verbatim> + +*.entityAlert.listeners=org.apache.customPath.customListener + +</verbatim> + +Currently Falcon natively supports [[BacklogMetricEmitterService][Back Log Emitter Service]] as a listener to EntitySLAAlert service + +---++Dependencies: + +*Other Services:* + +To enable Enity SLA Alerting service you need to enable [[EntitySLAMonitoring][Falcon Entity SLA Monitoring]] + + Following properties are needed in startup.properties: + +<verbatim> + +*.application.services=org.apache.falcon.service.EntitySLAAlertService + +*.entity.sla.statusCheck.frequency.seconds=600 +</verbatim> + +*Falcon Database:* + +Entity SLA Alerting service maintains its state in the database.It needs one table *ENTITY_SLA_ALERTS* please have a look at [[FalconDatabase]] to know how to create it. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/falcon/blob/91c68bea/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki ---------------------------------------------------------------------- diff --git a/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki b/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki new file mode 100644 index 0000000..bdd9ac4 --- /dev/null +++ b/trunk/releases/0.11/src/site/twiki/EntitySLAMonitoring.twiki @@ -0,0 +1,25 @@ +---++Falcon Entity SLA Monitoring + +Entity SLA monitoring allows you to monitor the entity (process and feed) .It keeps track of the instances of the entity that are running and stores them in the db. + + +---++Dependencies: + +*Other Services:* + +Entity SLA monitoring service requires FalconJPAService to be up. Following are the values you need to set to run EntitySLAMonitoring. +In startup.properties: + +<verbatim> +*.application.services= org.apache.falcon.state.store.service.FalconJPAService, + org.apache.falcon.service.EntitySLAMonitoringService +</verbatim> + + +*Falcon Database:* + +Entity SLA monitoring service maintains its state in the database.It needs two tables: + + 1.MONITORED_ENTITY + 1.PENDING_INSTANCES +please have a look at [[FalconDatabase]] to know how to create them.