Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Sally Ahn Fri, 14 Nov 2014 00:20:06 -0800

Yes, I am now seeing the snapshots complete in about 2 minutes after 
switching to a new, empty bucket.
I'm not sure why the initial request to snapshot to the empty repo was 
hanging because the snapshot did in fact complete in about 2 minutes, 
according to the S3 timestamp.
Time to automate deletion of old snapshots. :)
Thanks for the response!


On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:
>
> Having too many snapshots is problematic. Each snapshot is done in 
> incremental manner, so in order to figure out what changes and what is 
> available all snapshots in the repository needs to be scanned, which takes 
> time as number of snapshots growing. I would recommend pruning old 
> snapshots as time goes by or starting snapshots into a new bucket/directory 
> if you really need to maintain 2 hour resolution for 2 months old 
> snapshots. The get command can sometimes hang because it's throttled by the 
> on-going snapshot. 
>
>
> On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
>>
>> I am also interested in this topic.
>> We were snapshotting our cluster of two nodes every 2 hours (invoked via 
>> a cron job) to an S3 repository (we were running ES 1.2.2 with 
>> cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with 
>> cloud-aws-plugin 2.4.0 but are still seeing issues described below).
>> I've been seeing an increase in the time it takes to complete a snapshot 
>> with each subsequent snapshot. 
>> I see a thread 
>> <https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ>
>>  where 
>> someone else was seeing the same thing, but that thread seems to have died.
>> In my case, snapshots have gone from taking ~5 minutes to taking about an 
>> hour, even between snapshots where data does not seem to have changed. 
>>
>> For example, you can see below a list of the snapshots stored in my S3 
>> repo. Each snapshot is named with a timestamp of when my cron job invoked 
>> the snapshot process. The S3 timestamp on the left shows the completion 
>> time of that snapshot, and it's clear that it's steadily increasing:
>>
>> 2014-09-30 10:05       686   
>> s3://<bucketname>/snapshot-2014.09.30-10:00:01
>> 2014-09-30 12:05       686   
>> s3://<bucketname>/snapshot-2014.09.30-12:00:01
>> 2014-09-30 14:05       736   
>> s3://<bucketname>/snapshot-2014.09.30-14:00:01
>> 2014-09-30 16:05       736   
>> s3://<bucketname>/snapshot-2014.09.30-16:00:01
>> ...
>> 2014-11-08 00:52      1488   
>> s3://<bucketname>/snapshot-2014.11.08-00:00:01
>> 2014-11-08 02:54      1488   
>> s3://<bucketname>/snapshot-2014.11.08-02:00:01
>> ...
>> 2014-11-08 14:54      1488   
>> s3://<bucketname>/snapshot-2014.11.08-14:00:01
>> 2014-11-08 16:53      1488   
>> s3://<bucketname>/snapshot-2014.11.08-16:00:01
>> ...
>> 2014-11-11 07:00      1638   
>> s3://<bucketname>/snapshot-2014.11.11-06:00:01
>> 2014-11-11 08:58      1638   
>> s3://<bucketname>/snapshot-2014.11.11-08:00:01
>> 2014-11-11 10:58      1638   
>> s3://<bucketname>/snapshot-2014.11.11-10:00:01
>> 2014-11-11 12:59      1638   
>> s3://<bucketname>/snapshot-2014.11.11-12:00:01
>> 2014-11-11 15:00      1638   
>> s3://<bucketname>/snapshot-2014.11.11-14:00:01
>> 2014-11-11 17:00      1638   
>> s3://<bucketname>/snapshot-2014.11.11-16:00:01
>>
>> I suspected that this gradual increase was related to the accumulation of 
>> old snapshots after I tested the following:
>> 1. I created a brand new cluster with the same hardware specs in the same 
>> datacenter and restored a snapshot of the problematic cluster taken few 
>> days back (i.e. not the latest snapshot). 
>> 2. I then backed up that restored data to a new empty bucket in the same 
>> S3 region, and that was very fast...a minute or less. 
>> 3. I then restored a later snapshot of the problematic cluster to the 
>> test cluster and tried backing it up again to the new bucket, and that also 
>> took about a minute or less.
>>
>> However, when I tried deleting the repository full of old snapshots from 
>> the problematic cluster and registering a brand new empty bucket, I found 
>> that my first snapshot to the new repository was also hanging indefinitely. 
>> I finally had to kill my snapshot curl command. There were no errors in the 
>> logs (the snapshot logger is very terse...wondering if anyone knows how to 
>> increase the verbosity for it).
>>
>> So my theory seems to have been debunked, and I am again at a loss. I am 
>> wondering whether the hanging snapshot is related to the slow snapshots I 
>> was seeing before I deleted that old repository. I have seen several issues 
>> in GitHub regarding hanging snapshots (#5958 
>> <https://github.com/elasticsearch/elasticsearch/issues/5958>, #7980 
>> <https://github.com/elasticsearch/elasticsearch/issues/7980>) and have 
>> tried using the elasticsearch-snapshot-cleanup 
>> <https://github.com/imotov/elasticsearch-snapshot-cleanup> utility on my 
>> cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I 
>> thought upgrading to 1.4.0 which included snapshot improvements may fix my 
>> issues, but it did not), and the script is not finding any running 
>> snapshots:
>>
>> [2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden 
>> Archer] started
>> [2014-11-13 05:37:45,451][INFO 
>> ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner]
>>  
>> No snapshots found
>> [2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden 
>> Archer] stopping ...
>>
>> Curling to _snapshot/REPO/_status also returns no ongoing snapshots:
>>
>> curl -XGET 
>> 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
>> {
>>   "snapshots" : [ ]
>> }
>>
>> I may try bouncing ES on each node to see if that kills whatever process 
>> is causing my requests to the snapshot module to hang (requests to other 
>> modules like _cluster/health returns fine; cluster health is green, and 
>> load is low for both nodes - 0.00, 0.06).
>>
>> I would really appreciate some help/guidance on how to debug/fix this 
>> issue and general recommendations on how to best achieve periodic 
>> snapshots. For example, cleaning up old snapshots seems rather difficult 
>> since we have to specify the snapshot name, which we would obtain by making 
>> a request to the snapshot module, which seems to hang often.
>>
>> Thanks,
>> Sally
>>
>>
>> On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
>>>
>>> Hi Vineeth,
>>>
>>> Thanks for the reply.
>>> I am aware of how to create and delete snapshots using cloud-aws.
>>>
>>> What I wanted to know was how should the work flow of periodic snapshot 
>>> be?especially how to deal with old snapshots? having too many old 
>>> snapshots- will this impact something?
>>>
>>> On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
>>>>
>>>> Hi , 
>>>>
>>>> There is a s3 repository plugin - 
>>>> https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
>>>> Use this.
>>>> The snapshots are incremental , so it should fit your purpose perfectly.
>>>>
>>>> Thanks
>>>>              Vineeth
>>>>
>>>> On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <
>>>> [email protected]> wrote:
>>>>
>>>>> I want to backup the data every 15-30 min. I will be storing the 
>>>>> snapshots in S3. 
>>>>>
>>>>> DELETE old and then PUT new snapshot many not be the best practice as 
>>>>> you may end up with nothing if something goes wrong.
>>>>>
>>>>> Using timestamp for snapshot names may be one option, but how to 
>>>>> delete old snapshots then?
>>>>> Does S3 life management cycle help to delete old snapshots?
>>>>>
>>>>> Looking forward to get some opinions on this.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec5d32be-e189-41fe-8568-952388582535%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Reply via email to