[
https://issues.apache.org/jira/browse/SOLR-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572603#comment-16572603
]
Gus Heck commented on SOLR-12357:
---------------------------------
Latest changes remove the static fields holding structures that were attempting
to guard against the case where multiple clients are sending documents that
trigger (redundant) requests to the overseer for extension of the TRA with new
collections. The rational is:
# The overseer command being invoked is idempotent, it locks on the alias and
will ignore excess invocations.
# The frequency of occurrence for this command is normally once per time slice
in the TRA (i.e. hourly/daily/monthly) which is very infrequent.
# The window for contention is the time it takes to create a collection (a
small number of seconds)
# Creating *many* clients (more than the processors on the receiving machine)
and sending many batches simultaneously is an anti-pattern already. So in
"normal" usage even on very large machines the order of magnitude of excess
overseer task is in the "dozens" and not the hundreds or thousands that are
likely to clog the overseer.
Conceivably, someone feeding many independent (update) requests to many
machines with large numbers of CPU's per machine could cause a serious flood,
but code to handle that can be added and subsequently maintained if someone
demonstrates a need for it. Till then this simplifies things.
I unlinked the first PR, abandoned the royally confused second PR, created a
clean 3rd PR and this time the bot picked it up quickly :). So now the PR
linked in this issue is the right one and has no spurious commits or merge
problems.
> TRA: Pre-emptively create next collection
> ------------------------------------------
>
> Key: SOLR-12357
> URL: https://issues.apache.org/jira/browse/SOLR-12357
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: David Smiley
> Priority: Major
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> When adding data to a Time Routed Alias (TRA), we sometimes need to create
> new collections. Today we only do this synchronously – on-demand when a
> document is coming in. But this can add delays as the documents inbound are
> held up for a collection to be created. And, there may be a problem like a
> lack of resources (e.g. ample SolrCloud nodes with space) that the policy
> framework defines. Such problems could be rectified sooner rather than later
> assume there is log alerting in place (definitely out of scope here).
> Pre-emptive TRA collection needs a time window configuration parameter,
> perhaps named something like "preemptiveCreateWindowMs". If a document's
> timestamp is within this time window _from the end time of the head/lead
> collection_ then the collection can be created pre-eptively. If no data is
> being sent to the TRA, no collections will be auto created, nor will it
> happen if older data is being added. It may be convenient to effectively
> limit this time setting to the _smaller_ of this value and the TRA interval
> window, which I think is a fine limitation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]