[ 
https://issues.apache.org/jira/browse/AIRAVATA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717215#comment-17717215
 ] 

ASF subversion and git services commented on AIRAVATA-3694:
-----------------------------------------------------------

Commit d0768e84bfd60814cf8f292afe380c1ef8bb9996 in airavata-django-portal's 
branch refs/heads/delta-topology-workshop from Marcus Christie
[ https://gitbox.apache.org/repos/asf?p=airavata-django-portal.git;h=d0768e84 ]

AIRAVATA-3694 Attempt to rollback the archive if something fails when deleting 
archived data


> User data archive management commands
> -------------------------------------
>
>                 Key: AIRAVATA-3694
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-3694
>             Project: Airavata
>          Issue Type: New Feature
>          Components: Django Portal
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>            Priority: Major
>
> Create management commands to manage archiving user data. The use case is the 
> gateway admin wants to archive older data and then delete that user data to 
> free up disk space.
> The management commands will handle creating archives (as tarballs) and 
> deleting the data from the user data archive directory. There will also be an 
> unarchive command. There are settings for the max age of files to be archived 
> and for the directory in which archives should be copied.
> How the archive file are archived. It's expected that the gateway admin would 
> periodically (perhaps by cron) copy the archive files from the web server to 
> some other file server.
> h3. Description
> archive_user_data creates a tarball archive of user data for all files and 
> directories that are older than some configured amount of days. In addition 
> to the tarball is a text file that lists all of the files and directories 
> that are archived. The tarball and text file can be periodically pushed to 
> tape backup or any other backup location.
> The configuration settings are 
> - GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
> - GATEWAY_USER_DATA_ARCHIVE_DIRECTORY
> -- this is the directory in which to place the archive files and is also the 
> place where temporary files are generated. Since the archive files can be 
> large, it's important that there be enough free disk space on the partition 
> where this directory lives
> - GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB
> -- defaults to 1 GB. This can be used to prevent creating a lot of small 
> archives since tape archives often want a few large files instead of many 
> small files.
> h4. Running archive_user_data
> All commands should be run as the gateway server user (pga).
> {code}
> python manage.py archive_user_data --dry-run
> {code}
> This just prints the files and directories that would be archived and exits. 
> Good for checking that configuration is correct, etc.
> {code}
> python manage.py archive_user_data
> {code}
> This will actually create an archive and then delete from user data the files 
> that were archived.
> {code}
> python manage.py archive_user_data --max-age MAX_AGE
> {code}
> The --max-age flag allows overriding the 
> GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS setting. This can be a good option to 
> create the first few archives when introducing the user data archive to an 
> existing gateway.
> h4. Running unarchive_user_data 
> unarchive_user_data requires an archive tarball as input. The main use case 
> for this command is that the gateway administrator wants to restore some 
> particular user data. First, the right archive must be found. The experiment 
> details view in Experiment Statistics will display the name of the archive 
> file for an experiment data directory that has been archived. Use this to 
> then retrieve the tarball from backup. Then run unarchive_user_data on the 
> file.
> {code}
> python manage.py unarchive_user_data 
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}
> The timestamps will be restored from the archive, including the last modified 
> timestamps. This means that the next time archive_user_data runs, all files 
> unarchived will be re-archived. Sometimes that is desired, but if you want to 
> reset the last modified times, use the {{--reset-modification}} option:
> {code}
> python manage.py unarchive_user_data --reset-modification 
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to