[ 
https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5795:
---------------------------

    Attachment: SOLR-5795.patch

Ok - overseer is dead, long live the overseer!

After investigating some different options for preventing too many nodes from 
triggering redundent deletes, what i came up with is this...

bq. In simple standalone instalations this method always returns true, but in 
cloud mode it will be true if and only if we are currently the leader of the 
(active) slice with the first name (lexigraphically).

I outlined the reasoning why I think this is the most straightforward solution 
in the code...

{noformat}
    // This is a lot simpler then doing our own "leader" election across all 
replicas 
    // of all shards since:
    //   a) we already have a per shard leader
    //   b) shard names must be unique
    //   c) ClusterState is already being "watched" by ZkController, no 
additional zk hits
    //   d) there might be multiple instances of this factory (in multiple 
chains) per 
    //      collection, so picking an ephemeral node name for our election 
would be tricky
{noformat}

Watching the logs when running the tests, things look pretty good, and seem to 
be operating as designed.  That said: I'd still like to try and come up with 
some additional black tests to verify only one node is triggering these deletes 
.. i've got some rough ideas, but nothing concrete -- i'll keep thinking about 
it.

Anybody see any problems with this approach?

----

bq. IMO there should be a default field name for ttl say \_ttl even if no field 
name is specified

I'd deliberately avoided doing that because I'm not a fan of "magic" field 
names and i wanted to ensure we supported the ability to use this processor 
_with out_ any sort of TTL calculation -- for people who just want to specify 
their own expiration field values explicitly.

that said: having a sensible default probably would make the common case more 
useful -- and we could always document (and test) using {{<null 
name="ttlFieldName"/>}} for people who wnat to disable it.

I'll look into adding that tomorrow.


> Option to periodically delete docs based on an expiration field -- or ttl 
> specified when indexed.
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5795
>                 URL: https://issues.apache.org/jira/browse/SOLR-5795
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5795.patch, SOLR-5795.patch, SOLR-5795.patch, 
> SOLR-5795.patch, SOLR-5795.patch
>
>
> A question I get periodically from people is how to automatically remove 
> documents from a collection at a certain time (or after a certain amount of 
> time).  
> Excluding from search results using a filter query on a date field is 
> trivial, but you still have to periodically send a deleteByQuery to clean up 
> those older "expired" documents.  And in the case where you want all 
> documents to auto-expire some fixed amount of time when they were indexed, 
> you still have to setup a simple UpdateProcessorto set that expiration date.  
> So i've been thinking it would be nice if there was a simple way to configure 
> solr to do it all for you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to