On Thu, Mar 23, 2017 at 6:10 AM, Derek Poh <d...@globalsources.com> wrote:
> Hi
>
> I have collections of products. I am doing indexing 3-4 times daily.
> Every day there are products that expired and I need to remove them from
> these collectionsdaily.
>
> Ican think of 2 ways to do this.
> 1. using collection aliasto switch between a main and temp collection.
> - clear and index the temp collection
> - create alias to temp collection.
> - clear and index the main collection.
> - create alias to main collection.
>
> this way require additional collections.
>

Another way of doing this is to have a moving alias (not constantly
clearing the "temp" collection). If you reindex daily, your index
would be called "products_YYYYmmdd" with an alias to "products". The
advantage of this is that you can roll back to a previous version of
the index if there are problems, and each index is guaranteed to be
freshly created with no artifacts.

The biggest consideration for me would be how long indexing your full
corpus takes you. If you can do it in a small period of time, then
full indexes would be preferable. If it takes a very long time,
deleting is preferable.

If you are doing a cloud setup, full indexes are even more appealing.
You can create the new collection on a single node (even if sharded;
just place each shard on the same node). This would only place the
indexing cost on that one node, whilst other nodes would be unaffected
by indexing degrading regular query response time. You also don't have
to distribute the documents around the cluster. There is no
distributed indexing in Solr, each replica has to index each document
again, even if it is not the leader.

Once indexing is complete, you can expand the collection by adding
replicas of that shard on other nodes - perhaps even removing it from
the node that did the indexing. We have a node that solely does
indexing, before the collection is queried for anything it is added to
the querying nodes.

You can do this manually, or you can automate it using the collections API.

Cheers

Tom

Reply via email to