[
https://issues.apache.org/jira/browse/AIRAVATA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717187#comment-17717187
]
ASF subversion and git services commented on AIRAVATA-3694:
-----------------------------------------------------------
Commit a248cbd7d37730547f5827c43bf371861ca2a8e3 in airavata-django-portal's
branch refs/heads/master from Marcus Christie
[ https://gitbox.apache.org/repos/asf?p=airavata-django-portal.git;h=a248cbd7 ]
AIRAVATA-3694 Basic archive_user_data management command
> User data archive management commands
> -------------------------------------
>
> Key: AIRAVATA-3694
> URL: https://issues.apache.org/jira/browse/AIRAVATA-3694
> Project: Airavata
> Issue Type: New Feature
> Components: Django Portal
> Reporter: Marcus Christie
> Assignee: Marcus Christie
> Priority: Major
>
> Create management commands to manage archiving user data. The use case is the
> gateway admin wants to archive older data and then delete that user data to
> free up disk space.
> The management commands will handle creating archives (as tarballs) and
> deleting the data from the user data archive directory. There will also be an
> unarchive command. There are settings for the max age of files to be archived
> and for the directory in which archives should be copied.
> How the archive file are archived. It's expected that the gateway admin would
> periodically (perhaps by cron) copy the archive files from the web server to
> some other file server.
> h3. Description
> archive_user_data creates a tarball archive of user data for all files and
> directories that are older than some configured amount of days. In addition
> to the tarball is a text file that lists all of the files and directories
> that are archived. The tarball and text file can be periodically pushed to
> tape backup or any other backup location.
> The configuration settings are
> - GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
> - GATEWAY_USER_DATA_ARCHIVE_DIRECTORY
> -- this is the directory in which to place the archive files and is also the
> place where temporary files are generated. Since the archive files can be
> large, it's important that there be enough free disk space on the partition
> where this directory lives
> - GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB
> -- defaults to 1 GB. This can be used to prevent creating a lot of small
> archives since tape archives often want a few large files instead of many
> small files.
> h4. Running archive_user_data
> All commands should be run as the gateway server user (pga).
> {code}
> python manage.py archive_user_data --dry-run
> {code}
> This just prints the files and directories that would be archived and exits.
> Good for checking that configuration is correct, etc.
> {code}
> python manage.py archive_user_data
> {code}
> This will actually create an archive and then delete from user data the files
> that were archived.
> {code}
> python manage.py archive_user_data --max-age MAX_AGE
> {code}
> The --max-age flag allows overriding the
> GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS setting. This can be a good option to
> create the first few archives when introducing the user data archive to an
> existing gateway.
> h4. Running unarchive_user_data
> unarchive_user_data requires an archive tarball as input. The main use case
> for this command is that the gateway administrator wants to restore some
> particular user data. First, the right archive must be found. The experiment
> details view in Experiment Statistics will display the name of the archive
> file for an experiment data directory that has been archived. Use this to
> then retrieve the tarball from backup. Then run unarchive_user_data on the
> file.
> {code}
> python manage.py unarchive_user_data
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}
> The timestamps will be restored from the archive, including the last modified
> timestamps. This means that the next time archive_user_data runs, all files
> unarchived will be re-archived. Sometimes that is desired, but if you want to
> reset the last modified times, use the {{--reset-modification}} option:
> {code}
> python manage.py unarchive_user_data --reset-modification
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)