[ https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916125#comment-13916125 ]
Hoss Man commented on SOLR-5795: -------------------------------- Here's the basic design i've been fleshing out in my head... * A new "{{ExpireDocsUpdateProcessorFactory}}" ** can compute an {{expiration}} field to add to indexed docs based on a "{{ttl}}" field in the input doc *** perhaps could also fallback to a {{ttl}} update request param when bulk adding similar to {{\_version\_}} ? *** {{IgnoreFieldUpdateProcessorFactory}} could be used to remove the {{ttl}} if they don't wnat a record in the index of when/why {{expiration_date}} was computed ** Can trigger periodic {{deleteByQuery}} on {{expiration}} time field * rough idea for configuration...{code} <processor class="solr.ExpireDocsUpdateProcessorFactory"> <!-- mandatory, must be a date based field in schema.xml --> <str name="expiration.fieldName">expire_at</str> <!-- optional, default is not to auto-expire docs --> <int name="deleteIntervalInSeconds">300</int> <!-- optional, default is not to compute expiration automatically if this field doesn't exist in schema, then IgnoreFieldUpdateProcessorFactory can be configured to remove it. --> <str name="ttl.fieldName">ttl</str> </process> {code} * {{ExpireDocsUpdateProcessorFactory.init()}} logic: ** if {{ttl.fieldName}} is specified make a note of it ** validate {{expiration.fieldName}} is set & exists in schema *** perhaps in managed schema mode create automatically if it doesn't? ** if {{deleteIntervalInSeconds}} is set: *** spin up a recurring {{ScheduledThreadPoolExecutor}} with a recurring {{AutoExpireDocsCallable}} *** add a core Shutdown hook to shutdown the executor when the core shuts down * {{ExpireDocsUpdateProcessor.processAdd()}} logic: ** if {{ttl.fieldName}} is configured & doc contains that field name: *** treat value as datemath from NOW and put computed value in {{expiration.fieldName}} ** else: No-Op * {{AutoExpireDocsCallable}} logic: ** if cloud mode, return No-Op unless we are running on the overseer ** Create a {{DeleteUpdateCommand}} using {{deleteByQuery}} of {{\[* TO NOW\]}} using the {{expiration.fieldName}} *** this can be fired directly against the {{UpdateRequestProcessor}} returned by the {{ExpireDocsUpdateProcessorFactory}} itself using a {{LocalSolrQueryRequest}} **** Or perhaps we make an optional configuration so you can specify any chain name and we fetch it from the SolrCore? *** the existing distributed delete logic should ensure it gets distributed cleanly in cloud mode *** NOTE: the executor should run on every node, and only do the overseer check when the executor fires, so even when the overseer changes periodically, whoever the current overseer is every X minutes will fire the delete. This, combined with things like {{DefaultValueUpdateProcessorFactory}}, {{IgnoreFieldUpdateProcessorFactory}} and {{FirstFieldValueUpdateProcessorFactory}} on the {{ttl.fieldName}} and/or {{expiration.fieldName}} should allow all sorts of various usecases: * every doc expires after X amount of time no matter what the client says * every doc defaults to an ttl of X unless it has a doc explicit ttl * every doc defaults to an ttl of X unless it has a doc explicit expire date * docs can optional expire after a ttl specified when they were indexed * docs can optional expire at an explicit time specified when they were indexed > Option to periodically delete docs based on an expiration field -- or ttl > specified when indexed. > ------------------------------------------------------------------------------------------------- > > Key: SOLR-5795 > URL: https://issues.apache.org/jira/browse/SOLR-5795 > Project: Solr > Issue Type: New Feature > Reporter: Hoss Man > Assignee: Hoss Man > > A question I get periodically from people is how to automatically remove > documents from a collection at a certain time (or after a certain amount of > time). > Excluding from search results using a filter query on a date field is > trivial, but you still have to periodically send a deleteByQuery to clean up > those older "expired" documents. And in the case where you want all > documents to auto-expire some fixed amount of time when they were indexed, > you still have to setup a simple UpdateProcessorto set that expiration date. > So i've been thinking it would be nice if there was a simple way to configure > solr to do it all for you. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org