[
https://issues.apache.org/jira/browse/HDFS-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-3956:
------------------------------
Attachment: hdfs-3956.txt
Attached patch fixes the issue.
Testing:
- I added some new files to the existing purging test
- I fixed a bug whereby the random fault test wasn't actually purging the files
before -- since it was calling {{purgeLogsOlderThan}} before it called
{{recoverUnclosedSegments}}, the request was just getting rejected. Now it
properly purges them, and I verified the purging behavior by running {{watch
'find ./build/test/data/dfs/journalnode-2 | sort'}} during the test run.
- I ran 5000 instances of the random fault test and it passed with no
AssertionErrors
This applies on top of HDFS-3950 and HDFS-3955
> QJM: purge temporary files when no longer within retention period
> -----------------------------------------------------------------
>
> Key: HDFS-3956
> URL: https://issues.apache.org/jira/browse/HDFS-3956
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Attachments: hdfs-3956.txt
>
>
> After doing a bunch of fault testing, I noticed that the JNs had a bunch of
> temporary files left around in their journal directories which were no longer
> within the retention period. For example, if a JN crashes in the middle of
> recovery, it can leave around a file like {{edits_inprogress_123.epoch=10}}.
> These files are handy to keep around for forensics/debugging while they are
> still in their retention period, but we should not leave them forever. The
> normal purging policy should apply.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira