I switched to using aliases about a year ago and I love it.  I am able to 
rebuild in the background and make a clean cutover once the process 
completes.

Here are a couple of thoughts for your situation.  

First create a second index that has the same format as your original. 
 When you are ready to start creating your final index, stop indexing to 
your original and start indexing into this new index.  Queries to both 
indexes can be accomplished using a new alias, or by modifying the requests 
to include both.  Now you can transfer the bulk of your data from 
workshop_index_v1 to workshop_index_v2 while workshop_index_v1 new 
continues to collect the new documents.  Once the initial scan and scroll 
completes, you can cut over to workshop_index_v2 and run a scan and scroll 
against the v1_new index, which should be relatively small and allow you to 
quickly transfer those into your v2 schema.

The alternative is to run the scan and scroll twice against the v1 index. 
 Once to build the v2 index, at which point you cut to v2.  The second time 
to pick up any documents that were added after you started your initial 
scan and scroll.  This is a less than ideal scenario, will take longer, and 
will result in an index with many deletes, without additional steps to 
check to see if documents already exist.  If you have a timestamp in your 
documents, you might be able to make this reasonable.  You will certainly 
want to optimize after you complete this process.

The only downside to writing to the new one, is which one do you query 
during the transition.  If you write to the v2 index, queries to v1 will 
not show new data, while queries to v2 will only show new data until the 
migration progresses.  Queries that span both may be complicated as the 
mappings are different, if that is not the case then yes this is the easy 
way.  If you are ok with one of the caveats, then by all means this is the 
simplest route.

Aaron

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:
>
> Hey everyone,
>
> I have a question about rebuilding an index. After reading the 
> elasticsearch guide and various topics here I've found that the best 
> practice for rebuilding an index without any downtime is by using aliases. 
> However, there are certain steps and processes around that, which I seek 
> advice for. First I'm going to take you through an example scenario, and 
> then I'll have some questions.
>
> For example, you have "workshop_index_v1", with an alias "workshop". The 
> "workshop_index_v1" has a type called "guitar" which has three properties 
> with the following mapping:
>
> "identifier" : "string"
> "make" : "string"
> "model" : "string"
>
> Lets assume there is a lot of data in workshop_index_v1/guitar at the 
> moment, which has been populated from a separate database.
>
> Now, I need to modify the mapping, because I've changed the source data, I 
> would like get rid of the "identifier" property, so my mapping becomes:
>
> "make" : "string"
> "model" : "string"
>
> As we all know elasticsearch does not allow you to remove a property in 
> the mapping directly, you inevitably have to rebuild the index, which is 
> fine in my case.
>
> So now a few things came to mind when I thought how to do this:
>
>    - Create another index "workshop_index_v2", populate it with the data 
>    in "workshop_index_v1" using scroll and scan with the bulk API and later 
>    remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
>    - This will not work because the incorrect mapping(or a field value in 
>       the incorrect mapping) is already present in  "workshop_index_v1", I do 
> not 
>       want to copy everything as is.
>    - Create another index "workshop_index_v2", populate it with the data 
>    from the original source
>       - This works
>    
> One of the big issues here is, what happens to write requests while the 
> new index is being rebuilt.
>
> As you can only write to one index, which one do you write to, the old one 
> or the new one, or both?
>
> I feel, that writing to the new one, would work. I am beginner when it 
> comes to elasticsearch, any advice regarding any of this would be greatly 
> appreciated.
>
> Best regards
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to