[
https://issues.apache.org/jira/browse/AIRAVATA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Christie updated AIRAVATA-3694:
--------------------------------------
Description:
Create management commands to manage archiving user data. The use case is the
gateway admin wants to archive older data and then delete that user data to
free up disk space.
The management commands will handle creating archives (as tarballs) and
deleting the data from the user data archive directory. There will also be an
unarchive command. There are settings for the max age of files to be archived
and for the directory in which archives should be copied.
How the archive file are archived. It's expected that the gateway admin would
periodically (perhaps by cron) copy the archive files from the web server to
some other file server.
### Description
archive_user_data creates a tarball archive of user data for all files and
directories that are older than some configured amount of days. In addition to
the tarball is a text file that lists all of the files and directories that are
archived. The tarball and text file can be periodically pushed to tape backup
or any other backup location.
The configuration settings are
- GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
- GATEWAY_USER_DATA_ARCHIVE_DIRECTORY
this is the directory in which to place the archive files and is also the place
where temporary files are generated. Since the archive files can be large, it's
important that there be enough free disk space on the partition where this
directory lives
- GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB
defaults to 1 GB. This can be used to prevent creating a lot of small archives
since tape archives often want a few large files instead of many small files.
#### Running archive_user_data
All commands should be run as the gateway server user (pga).
```
python manage.py archive_user_data --dry-run
```
This just prints the files and directories that would be archived and exits.
Good for checking that configuration is correct, etc.
```
python manage.py archive_user_data --max-age MAX_AGE
```
The --max-age flag allows overriding the GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
setting. This can be a good option to create the first few archives when
introducing the user data archive to an existing gateway.
#### Running unarchive_user_data
unarchive_user_data requires an archive tarball as input. The main use case for
this command is that the gateway administrator wants to restore some particular
user data. First, the right archive must be found. The experiment details view
in Experiment Statistics will display the name of the archive file for an
experiment data directory that has been archived. Use this to then retrieve the
tarball from backup. Then run unarchive_user_data on the file.
```
python manage.py unarchive_user_data
/path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
```
The timestamps will be restored from the archive, including the last modified
timestamps. This means that the next time archive_user_data runs, all files
unarchived will be re-archived. Sometimes that is desired, but if you want to
reset the last modified times, use the `--reset-modification` option:
```
python manage.py unarchive_user_data --reset-modification
/path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
```
was:
Create management commands to manage archiving user data. The use case is the
gateway admin wants to archive older data and then delete that user data to
free up disk space.
The management commands will handle creating archives (as tarballs) and
deleting the data from the user data archive directory. There will also be an
unarchive command. There are settings for the max age of files to be archived
and for the directory in which archives should be copied.
How the archive file are archived. It's expected that the gateway admin would
periodically (perhaps by cron) copy the archive files from the web server to
some other file server.
> User data archive management commands
> -------------------------------------
>
> Key: AIRAVATA-3694
> URL: https://issues.apache.org/jira/browse/AIRAVATA-3694
> Project: Airavata
> Issue Type: New Feature
> Components: Django Portal
> Reporter: Marcus Christie
> Assignee: Marcus Christie
> Priority: Major
>
> Create management commands to manage archiving user data. The use case is the
> gateway admin wants to archive older data and then delete that user data to
> free up disk space.
> The management commands will handle creating archives (as tarballs) and
> deleting the data from the user data archive directory. There will also be an
> unarchive command. There are settings for the max age of files to be archived
> and for the directory in which archives should be copied.
> How the archive file are archived. It's expected that the gateway admin would
> periodically (perhaps by cron) copy the archive files from the web server to
> some other file server.
> ### Description
> archive_user_data creates a tarball archive of user data for all files and
> directories that are older than some configured amount of days. In addition
> to the tarball is a text file that lists all of the files and directories
> that are archived. The tarball and text file can be periodically pushed to
> tape backup or any other backup location.
> The configuration settings are
> - GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS
> - GATEWAY_USER_DATA_ARCHIVE_DIRECTORY
> this is the directory in which to place the archive files and is also the
> place where temporary files are generated. Since the archive files can be
> large, it's important that there be enough free disk space on the partition
> where this directory lives
> - GATEWAY_USER_DATA_ARCHIVE_MINIMUM_ARCHIVE_SIZE_GB
> defaults to 1 GB. This can be used to prevent creating a lot of small
> archives since tape archives often want a few large files instead of many
> small files.
> #### Running archive_user_data
> All commands should be run as the gateway server user (pga).
> ```
> python manage.py archive_user_data --dry-run
> ```
> This just prints the files and directories that would be archived and exits.
> Good for checking that configuration is correct, etc.
> ```
> python manage.py archive_user_data --max-age MAX_AGE
> ```
> The --max-age flag allows overriding the
> GATEWAY_USER_DATA_ARCHIVE_MAX_AGE_DAYS setting. This can be a good option to
> create the first few archives when introducing the user data archive to an
> existing gateway.
> #### Running unarchive_user_data
> unarchive_user_data requires an archive tarball as input. The main use case
> for this command is that the gateway administrator wants to restore some
> particular user data. First, the right archive must be found. The experiment
> details view in Experiment Statistics will display the name of the archive
> file for an experiment data directory that has been archived. Use this to
> then retrieve the tarball from backup. Then run unarchive_user_data on the
> file.
> ```
> python manage.py unarchive_user_data
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> ```
> The timestamps will be restored from the archive, including the last modified
> timestamps. This means that the next time archive_user_data runs, all files
> unarchived will be re-archived. Sometimes that is desired, but if you want to
> reset the last modified times, use the `--reset-modification` option:
> ```
> python manage.py unarchive_user_data --reset-modification
> /path/to/archive_seagrid_older_than_2023-04-17-22-15-34.tgz
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)