[
https://issues.apache.org/jira/browse/FALCON-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913310#comment-13913310
]
Arpit Gupta edited comment on FALCON-321 at 2/26/14 6:53 PM:
-------------------------------------------------------------
We understand that the we asked for a feature to remove empty dir's but what we
have noticed 2 issues.
1. Lets say you have the following dir structure
/dir1/child1/
/dir1/child2/
/dir1/child3
and lets assume we start processing /dir1/child1 we find no files we go ahead
and delete the dir and call the same method on parent which is dir1. The code
now does fs.getContentSummary(parent).getFileCount() on that /dir1. Now this
piece of code is not recursive so it will only check for files that are in dir1
and not for any files in /dir1/child2 or child3 and thus if it does not find
any files it will delete dir1. So essentially it could delete sub folders that
are not empty
2. Second issue we see is that at the start the evictor determines all the path
it needs to check for clean up and it will come up with dir1/child1
/dir1/child2 and dir1/child3. Now after processing /dir1/child1 /dir1 was
deleted so now when the evictor tried to delete /dir1/child2 it got an
exception and the failed.
So we need to handle num #2 better as well.
Ragav please correct me if i stated anything wrong.
was (Author: arpitgupta):
We understand that the we asked for a feature to remove empty dir's but what we
have noticed 2 issues.
1. Lets say you have the following dir structure
/dir1/child1/
/dir1/child2/
/dir1/child3
and lets assume we start processing /dir1/child1 we find no files we go ahead
and delete the dir and call the same method on parent which is dir1. The code
now does fs.getContentSummary(parent).getFileCount() on that /dir1. Now this
piece of code is not recursive so it will only check for files that are in dir1
and not for any files in /dir1/child2 or child3 and thus if it does not find
any files it will delete dir1. So essentially it could delete sub folders that
are not empty
2. Second issue we see is that at the start the evictor determines all the path
it needs to check for clean up and it will come up with dir1/child1
/dir1/child2 and dir1/child3. Now after processing /dir1/child1 /dir1 was
deleted so now when the evictor tried to delete /dir1/child2 it got an
exception and the failed.
So we need to handle num #2 better as well.
> Feed evictor deleting more stuff than it should
> -----------------------------------------------
>
> Key: FALCON-321
> URL: https://issues.apache.org/jira/browse/FALCON-321
> Project: Falcon
> Issue Type: Bug
> Reporter: Raghav Kumar Gautam
> Priority: Blocker
> Labels: system-tests
>
> In FeedEvictor.java we have:
> {code:java}
> private void deleteParentIfEmpty(FileSystem fs, Path parent, Path
> feedBasePath) throws IOException {
> if (feedBasePath.equals(parent)) {
> LOG.info("Not deleting feed base path:" + parent);
> } else {
> if (fs.getContentSummary(parent).getFileCount() == 0) {
> LOG.info("Parent path: " + parent + " is empty, deleting
> path");
> if (fs.delete(parent, true)) {
> LOG.info("Deleted empty dir: " + parent);
> } else {
> throw new IOException("Unable to delete parent path:" +
> parent);
> }
> deleteParentIfEmpty(fs, parent.getParent(), feedBasePath);
> }
> }
> }
> {code}
> In the fs.getContentSummary(parent).getFileCount() call if the parent has no
> files but has directories then we delete the parent directory. Which is
> incorrect.
> Here is log from falcon-regression's RetentionTest.testRetention(parameters:
> hours, 24, true, daily) :
> {noformat}
> 2014-02-24 15:09:45,034 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Applying retention on
> DATA=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}#META=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksMetaData#STATS=hdfs://raghav5-falcon-5.cs1cloud.internal:8020/projects/ivory/clicksStats#TMP=/tmp
> type: instance, Limit: hours(24), timezone: UTC, frequency: hours,
> storageFILESYSTEM
> 2014-02-24 15:09:45,051 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Normalized path : /retention/testFolders/${YEAR}/${MONTH}/${DAY}/${HOUR}
> 2014-02-24 15:09:45,123 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Searching for /retention/testFolders/*/*/*/*
> 2014-02-24 15:09:45,486 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Deleted instance :/retention/testFolders/2014/01/21/00
> 2014-02-24 15:09:45,500 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Parent path: /retention/testFolders/2014/01/21 is empty, deleting path
> 2014-02-24 15:09:45,509 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Deleted empty dir: /retention/testFolders/2014/01/21
> 2014-02-24 15:09:45,511 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Parent path: /retention/testFolders/2014/01 is empty, deleting path
> 2014-02-24 15:09:45,517 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Deleted empty dir: /retention/testFolders/2014/01
> 2014-02-24 15:09:45,518 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Parent path: /retention/testFolders/2014 is empty, deleting path
> 2014-02-24 15:09:45,525 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Deleted empty dir: /retention/testFolders/2014
> 2014-02-24 15:09:45,526 INFO [main] org.apache.falcon.retention.FeedEvictor:
> Not deleting feed base path:/retention/testFolders
> {noformat}
> Stacktrace:
> {noformat}
> Failing Oozie Launcher, Main class [org.apache.falcon.retention.FeedEvictor],
> main() threw exception, Unable to delete instance:
> /retention/testFolders/2014/01/21/03
> java.io.IOException: Unable to delete instance:
> /retention/testFolders/2014/01/21/03
> at
> org.apache.falcon.retention.FeedEvictor.deleteInstance(FeedEvictor.java:321)
> at
> org.apache.falcon.retention.FeedEvictor.fileSystemEvictor(FeedEvictor.java:174)
> at org.apache.falcon.retention.FeedEvictor.evictFS(FeedEvictor.java:149)
> at org.apache.falcon.retention.FeedEvictor.evict(FeedEvictor.java:139)
> at org.apache.falcon.retention.FeedEvictor.run(FeedEvictor.java:121)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.falcon.retention.FeedEvictor.main(FeedEvictor.java:93)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)