[jira] [Commented] (SOLR-11653) create next time collection based on a fixed time gap

David Smiley (JIRA) Tue, 02 Jan 2018 11:56:32 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308618#comment-16308618
 ]


David Smiley commented on SOLR-11653:
-------------------------------------

* I don't believe this patch exposes ROUTEDALIAS_CREATECOLL through v1 or v2; 
it takes internal code to invoke it.  Notice there is no reference to it in 
CollectionsHandler.  Eventually I do think it will be a useful command but I 
don't want to lengthen this issue with documenting it, ensuring v1 & v2, and 
thinking about it's API which might need work.  The first patch iteration 
exposed it but 2nd patch removed it from CollectionsHandler for the above 
reasons.
* RE Why the "extra layer":  Very good question; I should add some explanatory 
docs. I think you are wondering why does RoutedAliasCreateCollectionCmd exist 
as such when our URP could do the same actions? In my work for the Harvard BOP 
project, I approached it that way in fact.  The reason is that by adding an 
Overseer command, I can get code to operate in a mutex/lock by the alias name, 
thus ensuring that the choice of the next collection name & it's creation and 
addition to the alias happens atomically.  This isn't critical at the moment 
because the next collection name is deterministic, and thus could be handled at 
the URP with retries.  But eventually we'd like to have it be more dynamic like 
when a size threshold is reached, or simply because the user wants to (calls an 
API to make it happen on-demand).  Without a lock, I think it's impossible to 
support that.
** It does seem to be a shame that I need to create an Overseer command just to 
get a cluster lock on the alias name... not that it's *that* big a deal. I 
suppose using ZooKeeper directly (or probably better Curator) but unless other 
parts of Solr are doing this (I don't think so?), I don't want time routed 
aliases to be the first to break the mold.
** BTW I think it's silly that all the alias operations are Overseer commands 
since they merely do atomic operations against ZooKeeper (that compare the 
version) so what's the point?
* RE "+1SECOND" sure that's perhaps not realistic but I'm not sure we want to 
insist you can't do it.  We already round away unnecessary _00 suffixes of 
seconds, minutes, and hours.
* RE create collection loop: What is not clear in the patch is that 
parsedCollectionAliases is going to be updated with every new collection (since 
it gets prepended to the alias).  I want to improve the clarity of the logic to 
instead have it examine the head collection name to see that it's different.  
And maybe we don't need 5 retries; maybe none or make it configurable?
* Yes in SOLR-11722 please add maxFutureMS.  But I don't think that issue 
should create more than the initial collection.
* In a couple cases you've mentioned creating the next collection in advance of 
it being needed.  Yes absolutely, LucidWorks' Fusion appropriately calls this 
"preemptive" creation BTW. But I want to make that a separate feature we can 
work on later, these issues open now have enough to do without worrying about 
that :-)
* Ah, I really like your suggestion of "most recent" naming... thus I'll do 
some renames even if it's more wordy.

> create next time collection based on a fixed time gap
> -----------------------------------------------------
>
>                 Key: SOLR-11653
>                 URL: https://issues.apache.org/jira/browse/SOLR-11653
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-11653.patch, SOLR-11653.patch
>
>
> For time series collections (as part of a collection Alias with certain 
> metadata), we want to automatically add new collections. In this issue, this 
> is about creating the next collection based on a configurable fixed time gap. 
>  And we will also add this collection synchronously once a document flowing 
> through the URP chain exceeds the gap, as opposed to asynchronously in 
> advance.  There will be some Alias metadata to define in this issue.  The 
> preponderance of the implementation will be in TimePartitionedUpdateProcessor 
> or perhaps a helper to this URP.
> note: other issues will implement pre-emptive creation and capping 
> collections by size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11653) create next time collection based on a fixed time gap

Reply via email to