Colton, Interesting tool and thanks for contributing. Will definitely check it out. One of the main index maintenance tasks that I do is to remove all replicas on older (backup) indices. I am currently doing this task manually because there needs to be human verification of certain criteria before indices are closed/deleted.
BTW, Bruce Wayne is DC Comics, not Marvel. Cheers, Ivan (not a comic book reader) On Wed, Apr 2, 2014 at 3:57 AM, Colton <[email protected]> wrote: > Hello ElasticSearch Community, > > My name is Colton McInroy and I work with DOSarrest Internet Security > LTD. Over the past few months I have been working with ElasticSearch fairly > closely and building a infrastructure for it. When dealing with lots of > indices, managing lots them can be somewhat difficult in most web > interfaces we found. We wanted to be able to for instance, have indices > over a certain amount of time expire out of the cluster. We came across > curator (https://github.com/elasticsearch/curator) which came fairly > close, but had some limitations. I decided to spend a couple of days > building our own tool from scratch which after discussion we have decided > to release to the public via open source. We have called this tool Alfred, > after Bruce Wayne's butler Alfred Pennyworth, keeping in line with the > Marvel comics theme. > > Alfred can be set up in a cronjob to automatically groom your indices > so that you only keep a certain amount of data, optimize indexes, change > settings (such as changing routing), and more. By default no changes are > made unless you specify the -r or --run parameter. In its default mode, you > can test this tool all you want and get output to see what would have been > done without changes actually occurring. You can use the -D option to > specify more debug output also if you want to see what's going on (such as > "-D debug"). Once you are ready, add the -r parameter and watch Alfred do > all the work for you. > > Alfred was developed in Java, but does not use the ElasticSearch Java > API, rather it uses the restful api through the use of Apache HttpClient ( > http://hc.apache.org/httpclient-3.x/). The following libraries are > included via maven into Alfred... > > joda-time 2.3 > httpcore 4.3.2 > gson 2.2.4 > httpclient 4.3.3 > commons-logging 1.1.3 > commons-codec 1.6 > commons-cli 1.2 > > A jar build is located at > https://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jar > Our Github page with source and README is located at > https://github.com/DOSarrest-Internet-Security/alfred > > Here is some of that README file to explain how to use alfred... > > usage: alfred > -b,--debloom Disable Bloom on Indexes > -B,--bloom Enable Bloom on Indexes > -c,--close Close Indexes > -D,--debug <arg> Display debug (debug|info|warn|error|fatal) > -d,--delete Delete Indexes > -E,--expiresize <arg> Byte size limit (Default 10 GB) > -e,--expiretime <arg> Number of time units old (Default 24) > --examples Show some examples of how to use Alfred > -f,--flush Flush Indexes > -h,--help Help Page (Viewing Now) > --host <arg> ElasticSearch Host > -i,--index <arg> Index pattern to match (Default _all) > --max_num_segments <arg> Optimize max_num_segments (Default 2) > -o,--optimize Optimize Indexes > -O,--open Open Indexes > --port <arg> ElasticSearch Port > -r,--run Required to execute changes on > ElasticSearch > -s,--style <arg> Clean up style (time|size) (Default time) > -S,--settings <arg> PUT settings > --ssl ElasticSearch SSL > -T,--time-unit <arg> Specify time units (hour|day|none) (Default > hour) > -t,--timeout <arg> ElasticSearch Timeout (Default 30) > Alfred Version: 0.0.1 > > > Alfred was built as a tool to handle maintenance work on ElasticSearch. > Alfred will delete, flush cache, optimize, close/open, enable/disable bloom > filter, as well as put settings on indexes. Alfred can do any of these > actions based on either time or size parameters. > > Examples: > > java -jar alfred.jar -e48 -i"cron_*" -d > > Delete any indexes starting with "cron_" that are older that 48 hours > > java -jar alfred.jar -e24 -i"cron_*" > -S'{"index.routing.allocation.require.tag":"historical"}' > > Set routing to require historical tag on any indexes starting with "cron_" > that are older that 24 hours > > java -jar alfred.jar -e24 -i"cron_*" -b -o > > Disable boom filter and optimize any indexes starting with "cron_" that > are older that 24 hours > > java -jar alfred.jar -ssize -E"1 GB" -d > > Find all indxes, group by prefix, and delete indexes over a limit of 1 GB. > Using the size style with an expire size does not check space based on a > single index but rather the indexes adding up over time. Such as the > following... > > java -jar alfred.jar -i"cron_*" -d -ssize -E"500 GB" > GENERAL: cron_2014_04_02_08 is 469.9 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_07 is 436.5 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_06 is 404.0 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_05 is 372.1 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_04 is 341.2 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_03 is 310.1 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_02 is 276.8 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_01 is 240.7 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_02_00 is 202.2 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_01_23 is 158.2 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_01_22 is 110.6 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_01_21 is 58.6 GiB bytes before the cuttoff. > GENERAL: cron_2014_04_01_20 is 3.1 GiB bytes before the cuttoff. > GENERAL: Index cron_2014_04_01_19 would have been deleted. > GENERAL: Index cron_2014_04_01_18 would have been deleted. > GENERAL: Index cron_2014_04_01_17 would have been deleted. > GENERAL: Index cron_2014_04_01_16 would have been deleted. > GENERAL: Index cron_2014_04_01_15 would have been deleted. > GENERAL: Index cron_2014_04_01_14 would have been deleted. > GENERAL: Index cron_2014_04_01_13 would have been deleted. > GENERAL: Index cron_2014_04_01_12 would have been deleted. > GENERAL: Index cron_2014_04_01_11 would have been deleted. > GENERAL: Index cron_2014_04_01_10 would have been deleted. > GENERAL: Index cron_2014_04_01_09 would have been deleted. > GENERAL: Index cron_2014_04_01_08 would have been deleted. > GENERAL: Index cron_2014_03_29_08 would have been deleted. > > If you are using daily indexes, such as the marvel indexes, you could use > the following examples to manage them > > java -jar alfred.jar -i".marvel-*" -d -ssize -E"500 GB" > > Keep the past 500 GB worth of marvel indices > > java -jar alfred.jar -i".marvel-*" -d -T"day" -e7 > > Delete marvel indices older than 7 days old > > java -jar alfred.jar -i".marvel-*" -b -o -T"day" --max_num_segments=4 -e1 > > Disable bloom filter and optimize marvel indices with max_num_segments=4 > over 1 day old > > The following regular expression is used to split indexes into appropriate > variables... > > ^((?<Name>[a-zA-Z0-9\\.\\-_]+)(?<PrefixSeparator>(_|-)+)(?<Year>[0-9]{4})(?<Separator>(\\.|_|-))(?<Month>[0-9]{2})(\\.|_|-)(?<Day>[0-9]{2})(\\.|_|-)?(?<Hour>[0-9]{2})?)$ > > As long as your indexes following the pattern of this regular expression, > Alfred will be glad to manage your indices. > > The -i parameter is passed to the URL > "http://host:port/INDEX/_stats/indices" > where "INDEX" is replaced by what ever the -i parameter contains. By > default, it does _all but you can specify all kind of wildcard options. > Such as -i".marvel-*", -i"logstash-*", -i"*2014_04_02*", etc. Alfred gave > us a lot of power to manage our indices, so we thought that the community > could use him as well. > > -- > Thanks, > Colton McInroy > > - Director of Security Engineering > > Phone > (Toll Free) *US* (888)-818-1344 Press 2 *UK* 0-800-635-0551 Press 2 My > Extension 101 24/7 Support [email protected] Email > [email protected] Website http://www.dosarrest.com > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com<https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDy-wimvtogL4Bg4xLYbeV0gdTNJBvaGTa0YVojqAXNLQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
