[ 
https://issues.apache.org/jira/browse/FALCON-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921251#comment-13921251
 ] 

Venkatesh Seetharam commented on FALCON-287:
--------------------------------------------

[~shaik.idris], thanks for taking time to review. 

bq. May be I missed the actual use-case of lineage, why do we need to persist 
it and who are the consumers of this information.
Typical use cases for lineage are impact analysis, retracing how a feed was 
generated to its source, etc.

bq. Secondly, instead of FalconPostProcessing storing this data on HDFS, which 
might further slowdown user workflow
This is a tradeoff. Simplicity vs Efficiency. Serializing to JSON and writing 
it to a file on HDFS should not be slow and the current implementation is also 
quite inefficient IMO. However, the downside to this issue is the NN namespace 
problem but we are cleaning this up proactively. 

bq. May be I got the intent of storing this, but what all additional data we 
require for each feed.
Things like sizes, scheme, etc. I do not know it all at this time but this 
framework will not need any change to the message passing structure and will 
allow scheme evolution and will not bleed. Its contained in LineageRecorder.

> Record lineage information in post processing
> ---------------------------------------------
>
>                 Key: FALCON-287
>                 URL: https://issues.apache.org/jira/browse/FALCON-287
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.5
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>              Labels: lineage
>         Attachments: FALCON-287-v1.patch, FALCON-287-v2.patch, 
> FALCON-287.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to