Colton,

Interesting tool and thanks for contributing. Will definitely check it out.
One of the main index maintenance tasks that I do is to remove all replicas
on older (backup) indices. I am currently doing this task manually because
there needs to be human verification of certain criteria before indices are
closed/deleted.

BTW, Bruce Wayne is DC Comics, not Marvel.

Cheers,

Ivan (not a comic book reader)


On Wed, Apr 2, 2014 at 3:57 AM, Colton <[email protected]> wrote:

>  Hello ElasticSearch Community,
>
>     My name is Colton McInroy and I work with DOSarrest Internet Security
> LTD. Over the past few months I have been working with ElasticSearch fairly
> closely and building a infrastructure for it. When dealing with lots of
> indices, managing lots them can be somewhat difficult in most web
> interfaces we found. We wanted to be able to for instance, have indices
> over a certain amount of time expire out of the cluster. We came across
> curator (https://github.com/elasticsearch/curator) which came fairly
> close, but had some limitations. I decided to spend a couple of days
> building our own tool from scratch which after discussion we have decided
> to release to the public via open source. We have called this tool Alfred,
> after Bruce Wayne's butler Alfred Pennyworth, keeping in line with the
> Marvel comics theme.
>
>     Alfred can be set up in a cronjob to automatically groom your indices
> so that you only keep a certain amount of data, optimize indexes, change
> settings (such as changing routing), and more. By default no changes are
> made unless you specify the -r or --run parameter. In its default mode, you
> can test this tool all you want and get output to see what would have been
> done without changes actually occurring. You can use the -D option to
> specify more debug output also if you want to see what's going on (such as
> "-D debug"). Once you are ready, add the -r parameter and watch Alfred do
> all the work for you.
>
>     Alfred was developed in Java, but does not use the ElasticSearch Java
> API, rather it uses the restful api through the use of Apache HttpClient (
> http://hc.apache.org/httpclient-3.x/). The following libraries are
> included via maven into Alfred...
>
> joda-time 2.3
> httpcore 4.3.2
> gson 2.2.4
> httpclient 4.3.3
> commons-logging 1.1.3
> commons-codec 1.6
> commons-cli 1.2
>
>     A jar build is located at
> https://github.com/DOSarrest-Internet-Security/alfred/raw/master/builds/alfred-0.0.1.jar
>     Our Github page with source and README is located at
> https://github.com/DOSarrest-Internet-Security/alfred
>
>     Here is some of that README file to explain how to use alfred...
>
> usage: alfred
>  -b,--debloom                  Disable Bloom on Indexes
>  -B,--bloom                    Enable Bloom on Indexes
>  -c,--close                    Close Indexes
>  -D,--debug <arg>              Display debug (debug|info|warn|error|fatal)
>  -d,--delete                   Delete Indexes
>  -E,--expiresize <arg>         Byte size limit  (Default 10 GB)
>  -e,--expiretime <arg>         Number of time units old (Default 24)
>     --examples                 Show some examples of how to use Alfred
>  -f,--flush                    Flush Indexes
>  -h,--help                     Help Page (Viewing Now)
>     --host <arg>               ElasticSearch Host
>  -i,--index <arg>              Index pattern to match (Default _all)
>     --max_num_segments <arg>   Optimize max_num_segments (Default 2)
>  -o,--optimize                 Optimize Indexes
>  -O,--open                     Open Indexes
>     --port <arg>               ElasticSearch Port
>  -r,--run                      Required to execute changes on
>                                ElasticSearch
>  -s,--style <arg>              Clean up style (time|size) (Default time)
>  -S,--settings <arg>           PUT settings
>     --ssl                      ElasticSearch SSL
>  -T,--time-unit <arg>          Specify time units (hour|day|none) (Default
>                                hour)
>  -t,--timeout <arg>            ElasticSearch Timeout (Default 30)
> Alfred Version: 0.0.1
>
>
> Alfred was built as a tool to handle maintenance work on ElasticSearch.
> Alfred will delete, flush cache, optimize, close/open, enable/disable bloom
> filter, as well as put settings on indexes. Alfred can do any of these
> actions based on either time or size parameters.
>
> Examples:
>
> java -jar alfred.jar -e48 -i"cron_*" -d
>
> Delete any indexes starting with "cron_" that are older that 48 hours
>
> java -jar alfred.jar -e24 -i"cron_*" 
> -S'{"index.routing.allocation.require.tag":"historical"}'
>
> Set routing to require historical tag on any indexes starting with "cron_"
> that are older that 24 hours
>
> java -jar alfred.jar -e24 -i"cron_*" -b -o
>
> Disable boom filter and optimize any indexes starting with "cron_" that
> are older that 24 hours
>
> java -jar alfred.jar -ssize -E"1 GB" -d
>
> Find all indxes, group by prefix, and delete indexes over a limit of 1 GB.
> Using the size style with an expire size does not check space based on a
> single index but rather the indexes adding up over time. Such as the
> following...
>
> java -jar alfred.jar -i"cron_*" -d -ssize -E"500 GB"
> GENERAL: cron_2014_04_02_08 is 469.9 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_07 is 436.5 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_06 is 404.0 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_05 is 372.1 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_04 is 341.2 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_03 is 310.1 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_02 is 276.8 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_01 is 240.7 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_02_00 is 202.2 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_01_23 is 158.2 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_01_22 is 110.6 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_01_21 is 58.6 GiB bytes before the cuttoff.
> GENERAL: cron_2014_04_01_20 is 3.1 GiB bytes before the cuttoff.
> GENERAL: Index cron_2014_04_01_19 would have been deleted.
> GENERAL: Index cron_2014_04_01_18 would have been deleted.
> GENERAL: Index cron_2014_04_01_17 would have been deleted.
> GENERAL: Index cron_2014_04_01_16 would have been deleted.
> GENERAL: Index cron_2014_04_01_15 would have been deleted.
> GENERAL: Index cron_2014_04_01_14 would have been deleted.
> GENERAL: Index cron_2014_04_01_13 would have been deleted.
> GENERAL: Index cron_2014_04_01_12 would have been deleted.
> GENERAL: Index cron_2014_04_01_11 would have been deleted.
> GENERAL: Index cron_2014_04_01_10 would have been deleted.
> GENERAL: Index cron_2014_04_01_09 would have been deleted.
> GENERAL: Index cron_2014_04_01_08 would have been deleted.
> GENERAL: Index cron_2014_03_29_08 would have been deleted.
>
> If you are using daily indexes, such as the marvel indexes, you could use
> the following examples to manage them
>
> java -jar alfred.jar -i".marvel-*" -d -ssize -E"500 GB"
>
> Keep the past 500 GB worth of marvel indices
>
> java -jar alfred.jar -i".marvel-*" -d -T"day" -e7
>
> Delete marvel indices older than 7 days old
>
> java -jar alfred.jar -i".marvel-*" -b -o -T"day" --max_num_segments=4 -e1
>
> Disable bloom filter and optimize marvel indices with max_num_segments=4
> over 1 day old
>
> The following regular expression is used to split indexes into appropriate
> variables...
>
> ^((?<Name>[a-zA-Z0-9\\.\\-_]+)(?<PrefixSeparator>(_|-)+)(?<Year>[0-9]{4})(?<Separator>(\\.|_|-))(?<Month>[0-9]{2})(\\.|_|-)(?<Day>[0-9]{2})(\\.|_|-)?(?<Hour>[0-9]{2})?)$
>
> As long as your indexes following the pattern of this regular expression,
> Alfred will be glad to manage your indices.
>
>     The -i parameter is passed to the URL 
> "http://host:port/INDEX/_stats/indices";
> where "INDEX" is replaced by what ever the -i parameter contains. By
> default, it does _all but you can specify all kind of wildcard options.
> Such as -i".marvel-*", -i"logstash-*", -i"*2014_04_02*", etc. Alfred gave
> us a lot of power to manage our indices, so we thought that the community
> could use him as well.
>
> --
>  Thanks,
> Colton McInroy
>
>    - Director of Security Engineering
>
>      Phone
> (Toll Free)   *US* (888)-818-1344 Press 2  *UK* 0-800-635-0551 Press 2    My
> Extension  101  24/7 Support  [email protected]  Email
> [email protected]  Website http://www.dosarrest.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com<https://groups.google.com/d/msgid/elasticsearch/533BED19.4000608%40dosarrest.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDy-wimvtogL4Bg4xLYbeV0gdTNJBvaGTa0YVojqAXNLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to