Hi Mark,

Your analysis is spot on. As of today feed retention is tightly coupled
with time stamp in the feed path and it is not possible to do retention on
basis of creation time of feed instance instead of time pattern in its
path.

Having said that your modelling use case is quite unique(and perfectly
valid) and is not supported by Falcon as of today. Falcon treats the path
pattern provided in data location as the path of a *feed instance *and not
as of *feed.  *In your use case you have parent folder and want to treat
all files in it as instances of feed, this can not be handled by falcon as
of today. We are planning to support aperiodic feeds in a future version of
Falcon may be 1.0 and we can consider this use case then.

For now may I suggest a workaround to leverage falcon retention for your
use case? When creating files put each file in a time based sub-directory
chosen on the basis of current time. Choose granularity based on the
granularity you want retention to operate on and change frequency of the
feed accordingly e.g. if you want retention to be operate on hourly
granularity, make feed's frequency as hours(1) and data location as
something like /incoming/${YEAR}/${MONTH}/${DAY}/${HOUR}. Hence all files
getting created in today's 10th hour should go to 2015/19/01/10 directory.
After this falcon retention will work smoothly and will delete all files in
all folders older than a timestamp.

Hope it helps.


Cheers
Ajay Yadava

On Tue, Jan 19, 2016 at 2:30 AM, Mark Greene <[email protected]> wrote:

> Hello,
>
> Running Falcon 6.1...
>
> My use case is to have a top-level directory where files are deposited,
> such as /incoming. Files should be processed or evicted based on their HDFS
> timestamps. Falcon appears to be forcing me into an ingestion pattern of
> writing files to a path of /incoming/[${YEAR}, ${MONTH}, ${DAY}, ${HOUR},
> or ${MINUTE}}. Is this absolutely the case, without exception or option?
>
> I am learning Apache Falcon, and confused by how the <location
> type="data"/> path of a Feed Entity is coupled to the
> FALCON_FEED_RETENTION_... ooozie workflow that is created and scheduled.
> From documentation and hands-on testing it appears that I cannot create a
> location data path without applying a ${YEAR}, ${MONTH}, ${DAY}, ${HOUR},
> or ${MINUTE} template to the physical path.
>
>
> My preference is to create a feed that looks like this:
>
> <feed xmlns='uri:falcon:feed:0.1' name='greenema-folder1'>
>   <frequency>minutes(5)</frequency>
>   <timezone>GMT-06:00</timezone>
>   <clusters>
>     <cluster name='MyCluster' type='source'>
>       <validity start='2015-01-16T19:54Z' end='2030-01-17T19:54Z'/>
>       <retention limit='hours(5)' action='delete'/>
>       <locations>
>         <location type='data'>
>         </location>
>         ......
>       </locations>
>     </cluster>
>   </clusters>
>   <locations>
>     .....
>     <location type='data' path='/user/greenema/folder1'>
>     </location>
>     ......
>   </locations>
>
> </feed>
>
> With the current definition I get the following error from
> org.apache.falcon.entity.FileSystemStorage.fileSystemEvictor:
>
>  Launcher exception: org.apache.falcon.FalconException: Couldn't evict feed
> from fileSystem
> org.apache.oozie.action.hadoop.JavaMainException:
> org.apache.falcon.FalconException: Couldn't evict feed from fileSystem
> at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
> at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
> at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.falcon.FalconException: Couldn't evict feed from
> fileSystem
> at
>
> org.apache.falcon.entity.FileSystemStorage.evict(FileSystemStorage.java:306)
> at org.apache.falcon.retention.FeedEvictor.run(FeedEvictor.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.falcon.retention.FeedEvictor.main(FeedEvictor.java:52)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
> ... 15 more
> Caused by: java.io.IOException: Unable to resolve pattern for feedPath:
> /user/greenema/folder1
> at org.apache.falcon.entity.FeedHelper.getFeedBasePath(FeedHelper.java:435)
> at
>
> org.apache.falcon.entity.FileSystemStorage.fileSystemEvictor(FileSystemStorage.java:331)
> at
>
> org.apache.falcon.entity.FileSystemStorage.evict(FileSystemStorage.java:300)
> ... 23 more
> --
>
> Any help or clarification to the Entity Definition documentation on what is
> *required* of a path definition would be very helpful. Thanks!
>
> Mark Greene
> *E:* [email protected]
>

Reply via email to