Repository: falcon
Updated Branches:
  refs/heads/master 9066eac27 -> 09841bbea


FALCON-1204 Expose default configs for feed late data handling in 
runtime.properties. Contributed by Balu Vellanki.


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/09841bbe
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/09841bbe
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/09841bbe

Branch: refs/heads/master
Commit: 09841bbeab843df681f70ca21eb1c856507149c2
Parents: 9066eac
Author: Ajay Yadava <[email protected]>
Authored: Thu Jul 16 12:06:48 2015 +0530
Committer: Ajay Yadava <[email protected]>
Committed: Thu Jul 16 12:06:48 2015 +0530

----------------------------------------------------------------------
 CHANGES.txt                                   |  2 ++
 common/src/main/resources/runtime.properties  |  7 ++++++-
 docs/src/site/twiki/FalconDocumentation.twiki | 12 +++++++++++-
 src/conf/runtime.properties                   | 11 +++++++++--
 4 files changed, 28 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 8b96e78..63298f0 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -9,6 +9,8 @@ Trunk (Unreleased)
     FALCON-796 Enable users to triage data processing issues through falcon 
(Ajay Yadava)
     
   IMPROVEMENTS
+    FALCON-1204 Expose default configs for feed late data handling in 
runtime.properties(Balu Vellanki via Ajay Yadava)
+
     FALCON-1170 Falcon Native Scheduler - Refactor existing 
workflow/coord/bundle builder(Pallavi Rao via Ajay Yadava)
     
     FALCON-1031 Make post processing notifications to user topics optional 
(Pallavi Rao via Ajay Yadava)

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/common/src/main/resources/runtime.properties
----------------------------------------------------------------------
diff --git a/common/src/main/resources/runtime.properties 
b/common/src/main/resources/runtime.properties
index 8d465e8..3b32463 100644
--- a/common/src/main/resources/runtime.properties
+++ b/common/src/main/resources/runtime.properties
@@ -23,4 +23,9 @@
 
 *.falcon.replication.workflow.maxmaps=5
 *.falcon.replication.workflow.mapbandwidth=100
-webservices.default.max.results.per.page=100
+*.webservices.default.max.results.per.page=100
+
+# Default configs to handle replication for late arriving feeds.
+*.feed.late.allowed=true
+*.feed.late.frequency=hours(3)
+*.feed.late.policy=exp-backoff
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/docs/src/site/twiki/FalconDocumentation.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/FalconDocumentation.twiki 
b/docs/src/site/twiki/FalconDocumentation.twiki
index c374966..9804a57 100644
--- a/docs/src/site/twiki/FalconDocumentation.twiki
+++ b/docs/src/site/twiki/FalconDocumentation.twiki
@@ -561,7 +561,7 @@ simple and basic. The falcon system looks at all dependent 
input feeds for a pro
 cut-off period. Then it uses a scheduled messaging framework, like the one 
available in Apache ActiveMQ or Java's !DelayQueue to schedule a message with a 
cut-off period, then after a cut-off period the message is dequeued and Falcon 
checks for changes in the feed data which is recorded in HDFS in latedata file 
by falcons "record-size" action, if it detects any changes then the workflow 
will be rerun with the new set of feed data.
 
 *Example:*
-The late rerun policy can be configured in the process definition.
+For a process entity, the late rerun policy can be configured in the process 
definition.
 Falcon supports 3 policies, periodic, exp-backoff and final.
 Delay specifies, how often the feed data should be checked for changes, also 
one needs to 
 explicitly set the feed names in late-input which needs to be checked for late 
data.
@@ -575,6 +575,16 @@ explicitly set the feed names in late-input which needs to 
be checked for late d
 *NOTE:* Feeds configured with table storage does not support late input data 
handling at this point. This will be
 made available in the near future.
 
+For a feed entity replication job, the default late data handling policy can 
be configured in the runtime.properties file.
+Since these properties are runtime.properties, they will take effect for all 
replication jobs completed subsequent to the change.
+<verbatim>
+  # Default configs to handle replication for late arriving feeds.
+  *.feed.late.allowed=true
+  *.feed.late.frequency=hours(3)
+  *.feed.late.policy=exp-backoff
+</verbatim>
+
+
 ---++ Idempotency
 All the operations in Falcon are Idempotent. That is if you make same request 
to the falcon server / prism again you will get a SUCCESSFUL return if it was 
SUCCESSFUL in the first attempt. For example, you submit a new process / feed 
and get SUCCESSFUL message return. Now if you run the same command / api 
request on same entity you will again get a SUCCESSFUL message. Same is true 
for other operations like schedule, kill, suspend and resume.
 Idempotency also by takes care of the condition when request is sent through 
prism and fails on one or more servers. For example prism is configured to send 
request to 3 servers. First user sends a request to SUBMIT a process on all 3 
of them, and receives a response SUCCESSFUL from all of them. Then due to some 
issue one of the servers goes down, and user send a request to schedule the 
submitted process. This time he will receive a response with PARTIAL status and 
a FAILURE message from the server that has gone down. If the users check he 
will find the process would have been started and running on the 2 SUCCESSFUL 
servers. Now the issue with server is figured out and it is brought up. Sending 
the SCHEDULE request again through prism will result in a SUCCESSFUL response 
from prism as well as other three servers, but this time PROCESS will be 
SCHEDULED only on the server which had failed earlier and other two will keep 
running as before. 

http://git-wip-us.apache.org/repos/asf/falcon/blob/09841bbe/src/conf/runtime.properties
----------------------------------------------------------------------
diff --git a/src/conf/runtime.properties b/src/conf/runtime.properties
index a40d369..58dee3d 100644
--- a/src/conf/runtime.properties
+++ b/src/conf/runtime.properties
@@ -26,8 +26,15 @@
 #prism should have the following properties
 prism.all.colos=local
 prism.falcon.local.endpoint=https://localhost:15443
-#falcon server should have the following properties
+
+# falcon server should have the following properties
 falcon.current.colo=local
 webservices.default.max.results.per.page=100
+
 # retry count - to fetch the status from the workflow engine
-workflow.status.retry.count=30
\ No newline at end of file
+workflow.status.retry.count=30
+
+# Default configs to handle replication for late arriving feeds.
+feed.late.allowed=true
+feed.late.frequency=hours(3)
+feed.late.policy=exp-backoff
\ No newline at end of file

Reply via email to