[ https://issues.apache.org/jira/browse/FALCON-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332635#comment-15332635 ]
Venkatesan Ramachandran commented on FALCON-2030: ------------------------------------------------- [~ajayyadava] welcome back and thanks for the info. The reason is that we hit FALCON-2023 if no pattern is specified in the path. Also, for snapshot like data (the use case you are referring to), it will be better to write that under a subfolder -- it could be a timestamp pattern or version number (like EPOCH as a number). While accessing, workflows can use LATEST EL to get the latest folder and consume it. This way, the datasets version could be tracked and maintained. Even metadata can change (append/remove/update) although at a very slow rate. This way we can ensure inflight workflow/pipelines do not get affected by the addition/removal/update of data. Let me know what you think. > Enforce time partition pattern in the data location path in feed definition > ---------------------------------------------------------------------------- > > Key: FALCON-2030 > URL: https://issues.apache.org/jira/browse/FALCON-2030 > Project: Falcon > Issue Type: Improvement > Components: feed > Reporter: Venkatesan Ramachandran > Assignee: Venkatesan Ramachandran > > In feed definition, data location can be specified without time series > pattern like below: > <locations> > <location type="data" > path="/tmp/falcon-regression/RetentionTest/testFolders/"/> > <location type="stats" path="/projects/falcon/clicksStats"/> > <location type="meta" path="/projects/falcon/clicksMetaData"/> > </locations> -- This message was sent by Atlassian JIRA (v6.3.4#6332)