carlalex has proposed merging lp:~carlalex/duplicity/duplicity into lp:duplicity.
Commit message: Boto3 backend for AWS. Requested reviews: duplicity-team (duplicity-team) For more details, see: https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206 Boto3 backend for AWS. -- Your team duplicity-team is requested to review the proposed merge of lp:~carlalex/duplicity/duplicity into lp:duplicity.
=== modified file '.bzrignore' --- .bzrignore 2019-11-24 17:00:02 +0000 +++ .bzrignore 2019-11-30 21:45:03 +0000 @@ -25,4 +25,8 @@ testing/gnupg/.gpg-v21-migrated testing/gnupg/S.* testing/gnupg/private-keys-v1.d +<<<<<<< TREE duplicity/backends/rclonebackend.py +======= +duplicity-venv +>>>>>>> MERGE-SOURCE === modified file 'bin/duplicity.1' --- bin/duplicity.1 2019-05-05 12:16:14 +0000 +++ bin/duplicity.1 2019-11-30 21:45:03 +0000 @@ -706,7 +706,7 @@ Sets the update rate at which duplicity will output the upload progress messages (requires .BI --progress -option). Default is to prompt the status each 3 seconds. +option). Default is to print the status each 3 seconds. .TP .BI "--rename " "<original path> <new path>" @@ -730,6 +730,36 @@ duplicity --rsync-options="--partial-dir=.rsync-partial" /home/me rsync://[email protected]/some_dir .TP +.BI "--s3-use-boto3" +When backing up to Amazon S3, use the new boto3 based backend. The boto3 +backend is a rewrite of the older Amazon S3 backend, which was based on the +now deprecated and unsupported boto library. This new backend +fixes known limitations in the older backend, which have crept in as +Amazon S3 has evolved while the deprecated boto library has not kept up. + +The boto3 backend should behave largely the same as the older S3 backend, +but there are some differences in the handling of some of the "S3" options. +See the documentation for specific options for differences related to +each. + +The boto3 backend does not support bucket creation. +This is a deliberate choice which simplifies the code, and side steps +problems related to region selection. Additionally, it is probably +not a good practice to give your backup role bucket creation rights. +In most cases the role used for backups should probably be +limited to specific buckets. + +The boto3 backend only supports newer domain style buckets. Amazon is moving +to deprecate the older bucket style, so migration is recommended. +Use the older s3 backend for compatibility with backups stored in +buckets using older naming conventions. + +The boto3 backend does not currently support initiating restores +from the glacier storage class. When restoring a backup from +glacier or glacier deep archive, the backup files must first be +restored out of band. + +.TP .BI "--s3-european-buckets" When using the Amazon S3 backend, create buckets in Europe instead of the default (requires @@ -738,6 +768,9 @@ .B EUROPEAN S3 BUCKETS section. +This option does not apply when using the newer boto3 backend, which +does not create buckets (see above). + .TP .BI "--s3-unencrypted-connection" Don't use SSL for connections to S3. @@ -753,6 +786,8 @@ increment files. Unless that is disabled, an observer will not be able to see the file names or contents. +This option is not available when using the newer boto3 backend. + .TP .BI "--s3-use-new-style" When operating on Amazon S3 buckets, use new-style subdomain bucket @@ -760,6 +795,9 @@ is not backwards compatible if your bucket name contains upper-case characters or other characters that are not valid in a hostname. +This option has no effect when using the newer boto3 backend, which +will always use new style subdomain bucket naming. + .TP .BI "--s3-use-rrs" Store volumes using Reduced Redundancy Storage when uploading to Amazon S3. @@ -796,6 +834,20 @@ all other data is stored in S3 Glacier. .TP +.BI "--s3-use-deep-archive" +Store volumes using Glacier Deep Archive S3 when uploading to Amazon S3. This storage class +has a lower cost of storage but a higher per-request cost along with delays +of up to 12 hours from the time of retrieval request. This storage cost is +calculated against a 180-day storage minimum. According to Amazon this storage is +ideal for data archiving and long-term backup offering 99.999999999% durability. +To restore a backup you will have to manually migrate all data stored on AWS +Glacier Deep Archive back to Standard S3 and wait for AWS to complete the migration. +.B Notice: +Duplicity will store the manifest.gpg files from full and incremental backups on +AWS S3 standard storage to allow quick retrieval for later incremental backups, +all other data is stored in S3 Glacier Deep Archive. + +.TP .BI "--s3-use-multiprocessing" Allow multipart volumne uploads to S3 through multiprocessing. This option requires Python 2.6 and can be used to make uploads to S3 more efficient. @@ -803,6 +855,9 @@ uploaded in parallel. Useful if you want to saturate your bandwidth or if large files are failing during upload. +This has no effect when using the newer boto3 backend. Boto3 always +attempts to multiprocessing when it is believed it will be more efficient. + .TP .BI "--s3-use-server-side-encryption" Allow use of server side encryption in S3 @@ -814,6 +869,8 @@ to maximize the use of your bandwidth. For example, a chunk size of 10MB with a volsize of 30MB will result in 3 chunks per volume upload. +This has no effect when using the newer boto3 backend. + .TP .BI "--s3-multipart-max-procs" Specify the maximum number of processes to spawn when performing a multipart @@ -822,6 +879,8 @@ required to ensure you don't overload your system while maximizing the use of your bandwidth. +This has no effect when using the newer boto3 backend. + .TP .BI "--s3-multipart-max-timeout" You can control the maximum time (in seconds) a multipart upload can spend on @@ -829,6 +888,8 @@ hanging on multipart uploads or if you'd like to control the time variance when uploading to S3 to ensure you kill connections to slow S3 endpoints. +This has no effect when using the newer boto3 backend. + .TP .BI "--azure-blob-tier" Standard storage tier used for backup files (Hot|Cool|Archive). === added file 'duplicity/backends/_boto3backend.py' --- duplicity/backends/_boto3backend.py 1970-01-01 00:00:00 +0000 +++ duplicity/backends/_boto3backend.py 2019-11-30 21:45:03 +0000 @@ -0,0 +1,200 @@ +# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*- +# +# Copyright 2002 Ben Escoto <[email protected]> +# Copyright 2007 Kenneth Loafman <[email protected]> +# Copyright 2019 Carl A. Adams <[email protected]> +# +# This file is part of duplicity. +# +# Duplicity is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; either version 2 of the License, or (at your +# option) any later version. +# +# Duplicity is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with duplicity; if not, write to the Free Software Foundation, +# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +import duplicity.backend +from duplicity import globals +from duplicity import log +from duplicity.errors import FatalBackendException, BackendException +from duplicity import util +from duplicity import progress + + +# Note: current gaps with the old boto backend include: +# - no support for a hostname/port in S3 URL yet. +# - Glacier restore to S3 not implemented. Should this +# be done here? or is that out of scope. It can take days, +# so waiting seems like it's not ideal. "thaw" isn't currently +# a generic concept that the core asks of back-ends. Perhaps +# that is worth exploring. The older boto backend appeared +# to attempt this restore in the code, but the man page +# indicated that restores should be done out of band. +# If/when implemented, We should add the the following new features: +# - when restoring from glacier or deep archive, specify TTL. +# - allow user to specify how fast to restore (impacts cost). + +class BotoBackend(duplicity.backend.Backend): + u""" + Backend for Amazon's Simple Storage System, (aka Amazon S3), though + the use of the boto3 module. (See + https://boto3.amazonaws.com/v1/documentation/api/latest/index.html + for information on boto3.) +. + Pursuant to Amazon's announced deprecation of path style S3 access, + this backend only supports virtual host style bucket URIs. + See the man page for full details. + + To make use of this backend, you must provide AWS credentials. + This may be done in several ways: through the environment variables + AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, by the + ~/.aws/credentials file, by the ~/.aws/config file, + or by using the boto2 style ~/.boto or /etc/boto.cfg files. + """ + + def __init__(self, parsed_url): + duplicity.backend.Backend.__init__(self, parsed_url) + + # This folds the null prefix and all null parts, which means that: + # //MyBucket/ and //MyBucket are equivalent. + # //MyBucket//My///My/Prefix/ and //MyBucket/My/Prefix are equivalent. + url_path_parts = [x for x in parsed_url.path.split(u'/') if x != u''] + if url_path_parts: + self.bucket_name = url_path_parts.pop(0) + else: + raise BackendException(u'S3 requires a bucket name.') + + if url_path_parts: + self.key_prefix = u'%s/' % u'/'.join(url_path_parts) + else: + self.key_prefix = u'' + + self.parsed_url = parsed_url + self.straight_url = duplicity.backend.strip_auth_from_url(parsed_url) + self.s3 = None + self.bucket = None + self.tracker = UploadProgressTracker() + self.reset_connection() + + def reset_connection(self): + import boto3 + import botocore + from botocore.exceptions import ClientError + + self.bucket = None + self.s3 = boto3.resource('s3') + + try: + self.s3.meta.client.head_bucket(Bucket=self.bucket_name) + except botocore.exceptions.ClientError as bce: + error_code = bce.response['Error']['Code'] + if error_code == '404': + raise FatalBackendException(u'S3 bucket "%s" does not exist' % self.bucket_name, + code=log.ErrorCode.backend_not_found) + else: + raise + + self.bucket = self.s3.Bucket(self.bucket_name) # only set if bucket is thought to exist. + + def _put(self, local_source_path, remote_filename): + remote_filename = util.fsdecode(remote_filename) + key = self.key_prefix + remote_filename + + if globals.s3_use_rrs: + storage_class = u'REDUCED_REDUNDANCY' + elif globals.s3_use_ia: + storage_class = u'STANDARD_IA' + elif globals.s3_use_onezone_ia: + storage_class = u'ONEZONE_IA' + elif globals.s3_use_glacier and u"manifest" not in remote_filename: + storage_class = u'GLACIER' + elif globals.s3_use_deep_archive and u"manifest" not in remote_filename: + storage_class = u'DEEP_ARCHIVE' + else: + storage_class = u'STANDARD' + extra_args = {u'StorageClass': storage_class} + + if globals.s3_use_sse: + extra_args[u'ServerSideEncryption'] = u'AES256' + elif globals.s3_use_sse_kms: + if globals.s3_kms_key_id is None: + raise FatalBackendException(u"S3 USE SSE KMS was requested, but key id not provided " + u"require (--s3-kms-key-id)", + code=log.ErrorCode.s3_kms_no_id) + extra_args[u'ServerSideEncryption'] = u'aws:kms' + extra_args[u'SSEKMSKeyId'] = globals.s3_kms_key_id + if globals.s3_kms_grant: + extra_args[u'GrantFullControl'] = globals.s3_kms_grant + + # Should the tracker be scoped to the put or the backend? + # The put seems right to me, but the results look a little more correct + # scoped to the backend. This brings up questions about knowing when + # it's proper for it to be reset. + # tracker = UploadProgressTracker() # Scope the tracker to the put() + tracker = self.tracker + + log.Info(u"Uploading %s/%s to %s Storage" % (self.straight_url, remote_filename, storage_class)) + self.s3.Object(self.bucket.name, key).upload_file(local_source_path.uc_name, + Callback=tracker.progress_cb, + ExtraArgs=extra_args) + + def _get(self, remote_filename, local_path): + remote_filename = util.fsdecode(remote_filename) + key = self.key_prefix + remote_filename + self.s3.Object(self.bucket.name, key).download_file(local_path.uc_name) + + def _list(self): + filename_list = [] + for obj in self.bucket.objects.filter(Prefix=self.key_prefix): + try: + filename = obj.key.replace(self.key_prefix, u'', 1) + filename_list.append(filename) + log.Debug(u"Listed %s/%s" % (self.straight_url, filename)) + except AttributeError: + pass + return filename_list + + def _delete(self, remote_filename): + remote_filename = util.fsdecode(remote_filename) + key = self.key_prefix + remote_filename + self.s3.Object(self.bucket.name, key).delete() + + def _query(self, remote_filename): + import botocore + from botocore.exceptions import ClientError + + remote_filename = util.fsdecode(remote_filename) + key = self.key_prefix + remote_filename + content_length = -1 + try: + s3_obj = self.s3.Object(self.bucket.name, key) + s3_obj.load() + content_length = s3_obj.content_length + except botocore.exceptions.ClientError as bce: + if bce.response['Error']['Code'] == '404': + pass + else: + raise + return {u'size': content_length} + + +class UploadProgressTracker(object): + def __init__(self): + self.total_bytes = 0 + + def progress_cb(self, fresh_byte_count): + self.total_bytes += fresh_byte_count + progress.report_transfer(self.total_bytes, 0) # second arg appears to be unused + # It would seem to me that summing progress should be the callers job, + # and backends should just toss bytes written numbers over the fence. + # But, the progress bar doesn't work in a reasonable way when we do + # that. (This would also eliminate the need for this class to hold + # the scoped rolling total.) + # progress.report_transfer(fresh_byte_count, 0) === modified file 'duplicity/backends/botobackend.py' --- duplicity/backends/botobackend.py 2018-07-23 14:55:39 +0000 +++ duplicity/backends/botobackend.py 2019-11-30 21:45:03 +0000 @@ -23,10 +23,14 @@ import duplicity.backend from duplicity import globals -if globals.s3_use_multiprocessing: - from ._boto_multi import BotoBackend +if globals.s3_use_boto3: + from ._boto3backend import BotoBackend else: - from ._boto_single import BotoBackend + if globals.s3_use_multiprocessing: + from ._boto_multi import BotoBackend + else: + from ._boto_single import BotoBackend + # TODO: if globals.s3_use_boto3 duplicity.backend.register_backend(u"gs", BotoBackend) duplicity.backend.register_backend(u"s3", BotoBackend) === modified file 'duplicity/commandline.py' --- duplicity/commandline.py 2019-11-24 17:00:02 +0000 +++ duplicity/commandline.py 2019-11-30 21:45:03 +0000 @@ -506,7 +506,10 @@ # support european for now). parser.add_option(u"--s3-european-buckets", action=u"store_true") - # Whether to use S3 Reduced Redudancy Storage + # Use the boto3 implementation for s3 + parser.add_option(u"--s3-use-boto3", action=u"store_true") + + # Whether to use S3 Reduced Redundancy Storage parser.add_option(u"--s3-use-rrs", action=u"store_true") # Whether to use S3 Infrequent Access Storage @@ -515,6 +518,9 @@ # Whether to use S3 Glacier Storage parser.add_option(u"--s3-use-glacier", action=u"store_true") + # Whether to use S3 Glacier Deep Archive Storage + parser.add_option(u"--s3-use-deep-archive", action=u"store_true") + # Whether to use S3 One Zone Infrequent Access Storage parser.add_option(u"--s3-use-onezone-ia", action=u"store_true") === modified file 'duplicity/globals.py' --- duplicity/globals.py 2019-05-17 16:41:49 +0000 +++ duplicity/globals.py 2019-11-30 21:45:03 +0000 @@ -200,12 +200,20 @@ # Whether to use S3 Glacier Storage s3_use_glacier = False +# Whether to use S3 Glacier Deep Archive Storage +s3_use_deep_archive = False + # Whether to use S3 One Zone Infrequent Access Storage s3_use_onezone_ia = False # True if we should use boto multiprocessing version s3_use_multiprocessing = False +# True if we should use new Boto3 backend. This backend does not +# support some legacy features, so old back end retained for +# compatibility with old backups. +s3_use_boto3 = False + # Chunk size used for S3 multipart uploads.The number of parallel uploads to # S3 be given by chunk size / volume size. Use this to maximize the use of # your bandwidth. Defaults to 25MB === modified file 'requirements.txt' --- requirements.txt 2019-11-16 17:15:49 +0000 +++ requirements.txt 2019-11-30 21:45:03 +0000 @@ -26,6 +26,7 @@ # azure # b2sdk # boto +# boto3 # dropbox==6.9.0 # gdata # jottalib
_______________________________________________ Mailing list: https://launchpad.net/~duplicity-team Post to : [email protected] Unsubscribe : https://launchpad.net/~duplicity-team More help : https://help.launchpad.net/ListHelp

