[ 
https://issues.apache.org/jira/browse/SOLR-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307584#comment-16307584
 ] 

Gus Heck commented on SOLR-11653:
---------------------------------

Some thoughts: 

It seems that you've added ROUTEDALIAS_CREATECOLL as a user accessible command 
(the v1 API will respond to it I think) but not reflected in v2 api json files. 
I think this is because this command is  probably not meant for API invocation 
in the first place, so it kinda looks out of place as an undocumented API. I 
kind of wonder why it was done as an admin command. I'm probably missing 
something, but it seems like this means we have:
# doc arrives for which it is appropriate to create a new collection.
# issue admin command ROUTEDALIAS_CREATECOLL and wait for it
# inside ROUTEDALIAS_CREATECOLL issue 
CollectionsHandler.CollectionOperation.CREATE_OP.execute(... and wait for that
# finally process the update

I'm not sure why we want the extra layer? is there actually a use case for 
manual creation of the next partition? To me it seems as if this operation is 
internal to TimeRoutedAliasUpdateProcessor and should be there. I can possibly 
imagine that if we were proactively keeping ahead of things by one collection 
this structure could allow fast processing of the update by giving it an async 
id, but it looks like the code is set up to add a collection the first time it 
gets a document that requires that collection (up to maxFutureMs), and delay 
the update until that succeeds.

This makes me wonder if we should even be supporting +1SECOND? If this command 
is not sub-second we fall behind... That probably also is a really bad idea for 
sheer number of collections too. As a side benefit we could also shorten our 
collection names too...

I think your loop for creating collections should be dependent on the value of 
maxFutureMs? Let's say now() is 01:22 the head is presently ending in _01_00 
and accepting 01:00 hrs to 02:00 hrs, maxFutureMs=72*60*60*1000 and we get a 
document for 60 hours in the future... looks like we loop up to 5 times and 
successfully create the _02_00, 0_03_00, _04_00,_05_00,_06_00 partitions and 
then error out unless we get an update to our parsedCollectionAliases in the 
mean time... if any 5 creations are faster than the updates to 
parsedCollectionAliases we wrongly error out. The error message will be 
misleading too since the attempts to create the collections all succeeded but 
we quit due to some sort of communications lag. IF the parsedCollectionsAliases 
gets updated in time we will reset our counter and keep going, but why not 
directly calculate the appropriate minimum number of attempts from 
DocTimestampMs - HeadTimestampMs divided by HeadTimestampMs+(one interval) - 
HeadTimestampMs?  A constant can be added if we want some slack for retrying 
failed calls.

Perhaps my work in SOLR-11722 should be filling out all prospective next 
collections up to maxFutureMs (which I need to add to metadata)? Then in this 
code rather than testing the current document's time stamp to see if we create 
a new collection, test whether or not the next increment falls within 
MaxFutureMS. In the case that it's time to create a new collection we can then 
check the document value and create the next one asynchronously unless the 
current document would fall in the yet to be created collection (which should 
normally be a rare case). That would go from one guaranteed slow update every 
hour (for +1HOUR routed aliases) to only having a slow update if a document at 
the very beginning of an hour happens to fall *just* inside maxFutureMs. We 
would need to track that the collection was in progress for creation of course 
to avoid spamming the overseer each hour... 

It seems easier and more robust to never need to create more than one 
collection during an update. In that case your present loop is just fine.

In code, you have variable names and comments talking about "head" but I had to 
dig a little to confirm that this was actually the "most recent" not "the 
first".... be nice if comments made this clear.

I agree with your desire to rename handleResponse() ... confused me at first.

> create next time collection based on a fixed time gap
> -----------------------------------------------------
>
>                 Key: SOLR-11653
>                 URL: https://issues.apache.org/jira/browse/SOLR-11653
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-11653.patch, SOLR-11653.patch
>
>
> For time series collections (as part of a collection Alias with certain 
> metadata), we want to automatically add new collections. In this issue, this 
> is about creating the next collection based on a configurable fixed time gap. 
>  And we will also add this collection synchronously once a document flowing 
> through the URP chain exceeds the gap, as opposed to asynchronously in 
> advance.  There will be some Alias metadata to define in this issue.  The 
> preponderance of the implementation will be in TimePartitionedUpdateProcessor 
> or perhaps a helper to this URP.
> note: other issues will implement pre-emptive creation and capping 
> collections by size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to