[Tool Contribution] Alfred the ElasticSearch Butler

Colton Wed, 02 Apr 2014 03:58:24 -0700

Hello ElasticSearch Community,

My name is Colton McInroy and I work with DOSarrest InternetSecurity LTD. Over the past few months I have been working withElasticSearch fairly closely and building a infrastructure for it. Whendealing with lots of indices, managing lots them can be somewhatdifficult in most web interfaces we found. We wanted to be able to forinstance, have indices over a certain amount of time expire out of thecluster. We came across curator(https://github.com/elasticsearch/curator) which came fairly close, buthad some limitations. I decided to spend a couple of days building ourown tool from scratch which after discussion we have decided to releaseto the public via open source. We have called this tool Alfred, afterBruce Wayne's butler Alfred Pennyworth, keeping in line with the Marvelcomics theme.

Alfred can be set up in a cronjob to automatically groom yourindices so that you only keep a certain amount of data, optimizeindexes, change settings (such as changing routing), and more. Bydefault no changes are made unless you specify the -r or --runparameter. In its default mode, you can test this tool all you want andget output to see what would have been done without changes actuallyoccurring. You can use the -D option to specify more debug output alsoif you want to see what's going on (such as "-D debug"). Once you areready, add the -r parameter and watch Alfred do all the work for you.

Alfred was developed in Java, but does not use the ElasticSearchJava API, rather it uses the restful api through the use of ApacheHttpClient (http://hc.apache.org/httpclient-3.x/). The followinglibraries are included via maven into Alfred...


joda-time 2.3
httpcore 4.3.2
gson 2.2.4
httpclient 4.3.3
commons-logging 1.1.3
commons-codec 1.6
commons-cli 1.2

A jar build is located athttps://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jarOur Github page with source and README is located athttps://github.com/DOSarrest-Internet-Security/alfred


    Here is some of that README file to explain how to use alfred...

|usage: alfred
 -b,--debloom                  Disable Bloom on Indexes
 -B,--bloom                    Enable Bloom on Indexes
 -c,--close                    Close Indexes
 -D,--debug <arg>              Display debug (debug|info|warn|error|fatal)
 -d,--delete                   Delete Indexes
 -E,--expiresize <arg>         Byte size limit  (Default 10 GB)
 -e,--expiretime <arg>         Number of time units old (Default 24)
    --examples                 Show some examples of how to use Alfred
 -f,--flush                    Flush Indexes
 -h,--help                     Help Page (Viewing Now)
    --host <arg>               ElasticSearch Host
 -i,--index <arg>              Index pattern to match (Default _all)
    --max_num_segments <arg>   Optimize max_num_segments (Default 2)
 -o,--optimize                 Optimize Indexes
 -O,--open                     Open Indexes
    --port <arg>               ElasticSearch Port
 -r,--run                      Required to execute changes on
                               ElasticSearch
 -s,--style <arg>              Clean up style (time|size) (Default time)
 -S,--settings <arg>           PUT settings
    --ssl                      ElasticSearch SSL
 -T,--time-unit <arg>          Specify time units (hour|day|none) (Default
                               hour)
 -t,--timeout <arg>            ElasticSearch Timeout (Default 30)
Alfred Version: 0.0.1|

Alfred was built as a tool to handle maintenance work on ElasticSearch.Alfred will delete, flush cache, optimize, close/open, enable/disablebloom filter, as well as put settings on indexes. Alfred can do any ofthese actions based on either time or size parameters.


Examples:

|java -jar alfred.jar -e48 -i"cron_*" -d
|

Delete any indexes starting with "cron_" that are older that 48 hours

|java -jar alfred.jar -e24 -i"cron_*" 
-S'{"index.routing.allocation.require.tag":"historical"}'
|

Set routing to require historical tag on any indexes starting with"cron_" that are older that 24 hours


|java -jar alfred.jar -e24 -i"cron_*" -b -o
|

Disable boom filter and optimize any indexes starting with "cron_" thatare older that 24 hours


|java -jar alfred.jar -ssize -E"1 GB" -d
|

Find all indxes, group by prefix, and delete indexes over a limit of 1GB. Using the size style with an expire size does not check space basedon a single index but rather the indexes adding up over time. Such asthe following...


|java -jar alfred.jar -i"cron_*" -d -ssize -E"500 GB"
GENERAL: cron_2014_04_02_08 is 469.9 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_07 is 436.5 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_06 is 404.0 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_05 is 372.1 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_04 is 341.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_03 is 310.1 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_02 is 276.8 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_01 is 240.7 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_02_00 is 202.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_23 is 158.2 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_22 is 110.6 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_21 is 58.6 GiB bytes before the cuttoff.
GENERAL: cron_2014_04_01_20 is 3.1 GiB bytes before the cuttoff.
GENERAL: Index cron_2014_04_01_19 would have been deleted.
GENERAL: Index cron_2014_04_01_18 would have been deleted.
GENERAL: Index cron_2014_04_01_17 would have been deleted.
GENERAL: Index cron_2014_04_01_16 would have been deleted.
GENERAL: Index cron_2014_04_01_15 would have been deleted.
GENERAL: Index cron_2014_04_01_14 would have been deleted.
GENERAL: Index cron_2014_04_01_13 would have been deleted.
GENERAL: Index cron_2014_04_01_12 would have been deleted.
GENERAL: Index cron_2014_04_01_11 would have been deleted.
GENERAL: Index cron_2014_04_01_10 would have been deleted.
GENERAL: Index cron_2014_04_01_09 would have been deleted.
GENERAL: Index cron_2014_04_01_08 would have been deleted.
GENERAL: Index cron_2014_03_29_08 would have been deleted.
|

If you are using daily indexes, such as the marvel indexes, you coulduse the following examples to manage them


|java -jar alfred.jar -i".marvel-*" -d -ssize -E"500 GB"
|

Keep the past 500 GB worth of marvel indices

|java -jar alfred.jar -i".marvel-*" -d -T"day" -e7
|

Delete marvel indices older than 7 days old

|java -jar alfred.jar -i".marvel-*" -b -o -T"day" --max_num_segments=4 -e1
|

Disable bloom filter and optimize marvel indices with max_num_segments=4over 1 day old

The following regular expression is used to split indexes intoappropriate variables...


|^((?<Name>[a-zA-Z0-9\\.\\-_]+)(?<PrefixSeparator>(_|-)+)(?<Year>[0-9]{4})(?<Separator>(\\.|_|-))(?<Month>[0-9]{2})(\\.|_|-)(?<Day>[0-9]{2})(\\.|_|-)?(?<Hour>[0-9]{2})?)$
|

As long as your indexes following the pattern of this regularexpression, Alfred will be glad to manage your indices.

The -i parameter is passed to the URL"http://host:port/INDEX/_stats/indices"; where "INDEX" is replaced bywhat ever the -i parameter contains. By default, it does _all but youcan specify all kind of wildcard options. Such as -i".marvel-*",-i"logstash-*", -i"*2014_04_02*", etc. Alfred gave us a lot of power tomanage our indices, so we thought that the community could use him as well.


--
Thanks,
Colton McInroy

 * Director of Security Engineering

        
Phone
(Toll Free)     
_US_    (888)-818-1344 Press 2
_UK_    0-800-635-0551 Press 2

My Extension    101
24/7 Support    [email protected] <mailto:[email protected]>
Email   [email protected] <mailto:[email protected]>
Website         http://www.dosarrest.com

--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com.
For more options, visit https://groups.google.com/d/optout.

[Tool Contribution] Alfred the ElasticSearch Butler

Reply via email to