[ 
https://issues.apache.org/jira/browse/FALCON-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108764#comment-14108764
 ] 

Shwetha G S commented on FALCON-630:
------------------------------------

{quote}
Why do you need this? How is this different from feedNames? This same property 
can be overloaded with input names in the process and feed names in 
replication, no?
{quote}
Input name is different from feed name and falcon has validation that input 
names are unique, but not input feed names. This is useful for a lot of 
pipelines where data from different instances are handled differently. For 
example, the de-duping of events across 2 hours is done by taking (n-1)th and 
(n)th hour data for the same input feed. For all the events in (n-1)th hour, 
the data is de-duped against (n)th hour events. This process will need to 
define 2 inputs for the same feed. This is the reason that late data is defined 
on input names, rather than on input feeds in process.

Currently, late data for process is broken as the workflow param will have feed 
names, but late data section of process has input names. So, the comparison in 
code is wrong.

> late data rerun for process broken in trunk 
> --------------------------------------------
>
>                 Key: FALCON-630
>                 URL: https://issues.apache.org/jira/browse/FALCON-630
>             Project: Falcon
>          Issue Type: Bug
>          Components: rerun
>    Affects Versions: 0.5
>            Reporter: Samarth Gupta
>            Assignee: Shwetha G S
>            Priority: Blocker
>             Fix For: 0.4
>
>         Attachments: FALCON-630.patch
>
>
> late data rerun for process is not working . it seems like in pre processing 
> record size is storing data by Feed name and not by input name , due to which 
> late data is never detected. 
> {code}
>                     -falconInputFeeds
>                     FETL2-RRLog#FETL-RTBS-PRLog#FETL-RTBS-NPRLog
> {code}
> above even though param in tasktracker logs says InputFeeds , they are 
> actually feed name. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to