[ https://issues.apache.org/jira/browse/AIRAVATA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728863#comment-17728863 ]
ASF subversion and git services commented on AIRAVATA-3694: ----------------------------------------------------------- Commit 09e6aaf4350cf35ed92555a81e27aeb33d952843 in airavata's branch refs/heads/master from Marcus Christie [ https://gitbox.apache.org/repos/asf?p=airavata.git;h=09e6aaf435 ] AIRAVATA-3694 Ansible: configure data archive max ages for scigap hosted gateways > User data archive management commands > ------------------------------------- > > Key: AIRAVATA-3694 > URL: https://issues.apache.org/jira/browse/AIRAVATA-3694 > Project: Airavata > Issue Type: New Feature > Components: Django Portal > Reporter: Marcus Christie > Assignee: Marcus Christie > Priority: Major > > Create management commands to manage archiving user data. The use case is the > gateway admin wants to archive older data and then delete that user data to > free up disk space. > The management commands will handle creating archives (as tarballs) and > deleting the data from the user data archive directory. There will also be an > unarchive command. There are settings for the max age of files to be archived > and for the directory in which archives should be copied. > How the archive file are archived. It's expected that the gateway admin would > periodically (perhaps by cron) copy the archive files from the web server to > some other file server. > h3. Description > archive_user_data creates a tarball archive of user data for all files and > directories that are older than some configured amount of days. In addition > to the tarball is a text file that lists all of the files and directories > that are archived. The tarball and text file can be periodically pushed to > tape backup or any other backup location. > The configuration settings are > - GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS > - GATEWAY_USER_DATA_ARCHIVE_DIRECTORY > -- this is the directory in which to place the archive files and is also the > place where temporary files are generated. Since the archive files can be > large, it's important that there be enough free disk space on the partition > where this directory lives > - GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB > -- defaults to 1 GB. This can be used to prevent creating a lot of small > archives since tape archives often want a few large files instead of many > small files. > h4. Running archive_user_data > All commands should be run as the gateway server user (pga). > {code} > python manage.py archive_user_data --dry-run > {code} > This just prints the files and directories that would be archived and exits. > Good for checking that configuration is correct, etc. > {code} > python manage.py archive_user_data > {code} > This will actually create an archive and then delete from user data the files > that were archived. > {code} > python manage.py archive_user_data --max-age MAX_AGE > {code} > The --max-age flag allows overriding the > GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS setting. This can be a good option to > create the first few archives when introducing the user data archive to an > existing gateway. > h4. Running unarchive_user_data > unarchive_user_data requires an archive tarball as input. The main use case > for this command is that the gateway administrator wants to restore some > particular user data. First, the right archive must be found. The experiment > details view in Experiment Statistics will display the name of the archive > file for an experiment data directory that has been archived. Use this to > then retrieve the tarball from backup. Then run unarchive_user_data on the > file. > {code} > python manage.py unarchive_user_data > /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz > {code} > The timestamps will be restored from the archive, including the last modified > timestamps. This means that the next time archive_user_data runs, all files > unarchived will be re-archived. Sometimes that is desired, but if you want to > reset the last modified times, use the {{--reset-modification}} option: > {code} > python manage.py unarchive_user_data --reset-modification > /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)