Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Aaron Mildenstein Wed, 03 Dec 2014 04:27:46 -0800

I will include my response to the original post:

Snapshots are at the segment level.  The more segments stored in the 
> repository, the more segments will have to be compared to those in each 
> successive snapshot.  With merges taking place continually in an active 
> index, you may end up with a considerable number of "orphaned" segments 
> stored in your repository, i.e. segments "backed up," but no longer 
> directly correlating to a segment in your index.  Checking through these 
> may be contributing to the increased amount of time between snapshots.  
>
> Consider pruning older snapshots.  "Orphaned" segments will be deleted, 
> and any segments still referenced will be preserved.
>


On Thursday, November 20, 2014 7:22:03 AM UTC-5, João Costa wrote:
>
> Hello,
>
> Sorry for hijacking this thread, but I'm currently also pondering the best 
> way to perform periodic snapshots in AWS.
>
> My main concern is that we are using blue-green deployment with ephemeral 
> storage on EC2, so if for some reason there is a problem with the cluster, 
> we might lose a lot of data, therefore I would rather do frequent snapshots 
> (for this reason, we are still using the deprecated S3 gateway).
>
> The thing is, you claim that "Having too many snapshots is problematic" 
> and that one should "prune old snapshots". Since snapshots are incremental, 
> this will imply data loss, correct?
> Also, is the problem related to the number of snapshots or the size of the 
> data? Is there any way to merge old snapshots into one? Would this solve 
> the problem?
>
> Finally, if I create a cronjob to make automatic snapshots, can I run into 
> problems if two instances attempt to create a snapshot with the same name 
> at the same time?
> Also, what's the best way to do a snapshot on shutdown? Should I put a 
> script on init.d/rc.0 to run on shutdown before elasticsearch shuts down? 
> I've seen cases where the EC2 instances have "not so grateful" shutdowns, 
> so it would be wonder if there is a better way to do this on a cluster 
> level (ie, if a node A notices that a node B is not responding, then it 
> automatically makes a snapshot).
>
> Sorry if some of these questions don't make much sense, I'm still quite 
> new to elasticsearch and have not completly understood the new snapshot 
> feature.
>
> Em sexta-feira, 14 de novembro de 2014 08h19min42s UTC, Sally Ahn escreveu:
>>
>> Yes, I am now seeing the snapshots complete in about 2 minutes after 
>> switching to a new, empty bucket.
>> I'm not sure why the initial request to snapshot to the empty repo was 
>> hanging because the snapshot did in fact complete in about 2 minutes, 
>> according to the S3 timestamp.
>> Time to automate deletion of old snapshots. :)
>> Thanks for the response!
>>
>> On Thursday, November 13, 2014 9:35:20 PM UTC-8, Igor Motov wrote:
>>>
>>> Having too many snapshots is problematic. Each snapshot is done in 
>>> incremental manner, so in order to figure out what changes and what is 
>>> available all snapshots in the repository needs to be scanned, which takes 
>>> time as number of snapshots growing. I would recommend pruning old 
>>> snapshots as time goes by or starting snapshots into a new bucket/directory 
>>> if you really need to maintain 2 hour resolution for 2 months old 
>>> snapshots. The get command can sometimes hang because it's throttled by the 
>>> on-going snapshot. 
>>>
>>>
>>> On Wednesday, November 12, 2014 9:02:33 PM UTC-10, Sally Ahn wrote:
>>>>
>>>> I am also interested in this topic.
>>>> We were snapshotting our cluster of two nodes every 2 hours (invoked 
>>>> via a cron job) to an S3 repository (we were running ES 1.2.2 with 
>>>> cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with 
>>>> cloud-aws-plugin 2.4.0 but are still seeing issues described below).
>>>> I've been seeing an increase in the time it takes to complete a 
>>>> snapshot with each subsequent snapshot. 
>>>> I see a thread 
>>>> <https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ>
>>>>  where 
>>>> someone else was seeing the same thing, but that thread seems to have died.
>>>> In my case, snapshots have gone from taking ~5 minutes to taking about 
>>>> an hour, even between snapshots where data does not seem to have changed. 
>>>>
>>>> For example, you can see below a list of the snapshots stored in my S3 
>>>> repo. Each snapshot is named with a timestamp of when my cron job invoked 
>>>> the snapshot process. The S3 timestamp on the left shows the completion 
>>>> time of that snapshot, and it's clear that it's steadily increasing:
>>>>
>>>> 2014-09-30 10:05       686   
>>>> s3://<bucketname>/snapshot-2014.09.30-10:00:01
>>>> 2014-09-30 12:05       686   
>>>> s3://<bucketname>/snapshot-2014.09.30-12:00:01
>>>> 2014-09-30 14:05       736   
>>>> s3://<bucketname>/snapshot-2014.09.30-14:00:01
>>>> 2014-09-30 16:05       736   
>>>> s3://<bucketname>/snapshot-2014.09.30-16:00:01
>>>> ...
>>>> 2014-11-08 00:52      1488   
>>>> s3://<bucketname>/snapshot-2014.11.08-00:00:01
>>>> 2014-11-08 02:54      1488   
>>>> s3://<bucketname>/snapshot-2014.11.08-02:00:01
>>>> ...
>>>> 2014-11-08 14:54      1488   
>>>> s3://<bucketname>/snapshot-2014.11.08-14:00:01
>>>> 2014-11-08 16:53      1488   
>>>> s3://<bucketname>/snapshot-2014.11.08-16:00:01
>>>> ...
>>>> 2014-11-11 07:00      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-06:00:01
>>>> 2014-11-11 08:58      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-08:00:01
>>>> 2014-11-11 10:58      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-10:00:01
>>>> 2014-11-11 12:59      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-12:00:01
>>>> 2014-11-11 15:00      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-14:00:01
>>>> 2014-11-11 17:00      1638   
>>>> s3://<bucketname>/snapshot-2014.11.11-16:00:01
>>>>
>>>> I suspected that this gradual increase was related to the accumulation 
>>>> of old snapshots after I tested the following:
>>>> 1. I created a brand new cluster with the same hardware specs in the 
>>>> same datacenter and restored a snapshot of the problematic cluster taken 
>>>> few days back (i.e. not the latest snapshot). 
>>>> 2. I then backed up that restored data to a new empty bucket in the 
>>>> same S3 region, and that was very fast...a minute or less. 
>>>> 3. I then restored a later snapshot of the problematic cluster to the 
>>>> test cluster and tried backing it up again to the new bucket, and that 
>>>> also 
>>>> took about a minute or less.
>>>>
>>>> However, when I tried deleting the repository full of old snapshots 
>>>> from the problematic cluster and registering a brand new empty bucket, I 
>>>> found that my first snapshot to the new repository was also hanging 
>>>> indefinitely. I finally had to kill my snapshot curl command. There were 
>>>> no 
>>>> errors in the logs (the snapshot logger is very terse...wondering if 
>>>> anyone 
>>>> knows how to increase the verbosity for it).
>>>>
>>>> So my theory seems to have been debunked, and I am again at a loss. I 
>>>> am wondering whether the hanging snapshot is related to the slow snapshots 
>>>> I was seeing before I deleted that old repository. I have seen several 
>>>> issues in GitHub regarding hanging snapshots (#5958 
>>>> <https://github.com/elasticsearch/elasticsearch/issues/5958>, #7980 
>>>> <https://github.com/elasticsearch/elasticsearch/issues/7980>) and have 
>>>> tried using the elasticsearch-snapshot-cleanup 
>>>> <https://github.com/imotov/elasticsearch-snapshot-cleanup> utility on 
>>>> my cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I 
>>>> thought upgrading to 1.4.0 which included snapshot improvements may fix my 
>>>> issues, but it did not), and the script is not finding any running 
>>>> snapshots:
>>>>
>>>> [2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden 
>>>> Archer] started
>>>> [2014-11-13 05:37:45,451][INFO 
>>>> ][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner]
>>>>  
>>>> No snapshots found
>>>> [2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden 
>>>> Archer] stopping ...
>>>>
>>>> Curling to _snapshot/REPO/_status also returns no ongoing snapshots:
>>>>
>>>> curl -XGET 
>>>> 'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
>>>> {
>>>>   "snapshots" : [ ]
>>>> }
>>>>
>>>> I may try bouncing ES on each node to see if that kills whatever 
>>>> process is causing my requests to the snapshot module to hang (requests to 
>>>> other modules like _cluster/health returns fine; cluster health is green, 
>>>> and load is low for both nodes - 0.00, 0.06).
>>>>
>>>> I would really appreciate some help/guidance on how to debug/fix this 
>>>> issue and general recommendations on how to best achieve periodic 
>>>> snapshots. For example, cleaning up old snapshots seems rather difficult 
>>>> since we have to specify the snapshot name, which we would obtain by 
>>>> making 
>>>> a request to the snapshot module, which seems to hang often.
>>>>
>>>> Thanks,
>>>> Sally
>>>>
>>>>
>>>> On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
>>>>>
>>>>> Hi Vineeth,
>>>>>
>>>>> Thanks for the reply.
>>>>> I am aware of how to create and delete snapshots using cloud-aws.
>>>>>
>>>>> What I wanted to know was how should the work flow of periodic 
>>>>> snapshot be?especially how to deal with old snapshots? having too many 
>>>>> old 
>>>>> snapshots- will this impact something?
>>>>>
>>>>> On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
>>>>>>
>>>>>> Hi , 
>>>>>>
>>>>>> There is a s3 repository plugin - 
>>>>>> https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
>>>>>> Use this.
>>>>>> The snapshots are incremental , so it should fit your purpose 
>>>>>> perfectly.
>>>>>>
>>>>>> Thanks
>>>>>>              Vineeth
>>>>>>
>>>>>> On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I want to backup the data every 15-30 min. I will be storing the 
>>>>>>> snapshots in S3. 
>>>>>>>
>>>>>>> DELETE old and then PUT new snapshot many not be the best practice 
>>>>>>> as you may end up with nothing if something goes wrong.
>>>>>>>
>>>>>>> Using timestamp for snapshot names may be one option, but how to 
>>>>>>> delete old snapshots then?
>>>>>>> Does S3 life management cycle help to delete old snapshots?
>>>>>>>
>>>>>>> Looking forward to get some opinions on this.
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9786e098-d92f-497e-b4e2-f176094af9c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Reply via email to