[ 
https://issues.apache.org/jira/browse/SOLR-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-11653:
--------------------------------
    Attachment: SOLR-11653.patch

Here's a first draft patch that is fairly incomplete insofar as lacking tests 
and I haven't actually run this code at all yet.  It shows the approach.  There 
are two main parts:

(1) New *RoutedAliasCreateCollectionCmd*, an Overseer Cmd registered as 
"ROUTEDALIAS_CREATECOLL".  It adds the next collection to a time routed alias.  
It assumes the metadata on the alias with a certain prefix is collection 
creation metadata, and it mandates collection.configName is present (we want 
all the collections to have the same configset, by default any way).  The 
collection creation is invoked in two steps by first calling 
CollectionsHandler.CollectionOperation.CREATE_OP.execute to get the overseer 
message, and then it delivers it to CreateCollectionCmd indirectly via the 
OverseerCollectionMessageHandler.  The alias is updated to have the new 
collection at the first position (thus reverse chronological order).  Note that 
this Cmd has a parameter ifHeadCollName that is the head (latest) collection 
name that the caller sees when it calls the command.  If the head collection is 
something else, the Cmd returns without error, as it's assumed there may have 
been a race of multiple attempts to create the next collection at the same time.

(2) Changes to TimeRoutedAliasUpdateProcessor.  There's now a loop such that if 
we think we need to create the collection, we do so and then we retry from the 
start, more or less.  This is mostly because we may need to create a series of 
collections if the current collection head is very out of date.  I also added a 
check to throw an exception if the timestamp of the document is far into the 
future (currently 10 minutes).

So yeah I need to actually use it and work on tests.  But there are some code 
re-arrangement that should be done as well, I think.  The Cmd calls into the 
URP to share some code but it ought to be the other way around.  Or maybe a new 
"TimeRoutedAliasInfo" class could exist that is used by both the URP and Cmd?  
There will probably be some code sharing with SOLR-11722 like formatting the 
collection name from a timestamp -- CC [~gus_heck]

> create next time collection based on a fixed time gap
> -----------------------------------------------------
>
>                 Key: SOLR-11653
>                 URL: https://issues.apache.org/jira/browse/SOLR-11653
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-11653.patch
>
>
> For time series collections (as part of a collection Alias with certain 
> metadata), we want to automatically add new collections. In this issue, this 
> is about creating the next collection based on a configurable fixed time gap. 
>  And we will also add this collection synchronously once a document flowing 
> through the URP chain exceeds the gap, as opposed to asynchronously in 
> advance.  There will be some Alias metadata to define in this issue.  The 
> preponderance of the implementation will be in TimePartitionedUpdateProcessor 
> or perhaps a helper to this URP.
> note: other issues will implement pre-emptive creation and capping 
> collections by size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to