[
https://issues.apache.org/jira/browse/SOLR-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867248#comment-16867248
]
mosh commented on SOLR-12357:
-----------------------------
Lately we have encountered Time series data, which is sometimes broken and does
not have a date.
We have been planning on indexing the broken data using our indexing pipeline
into a separate collection,
though this got us wondering, whether we could propose an improvement to TRA,
and add this feature to its core logic.
Perhaps adding a new configuration for un-routable documents to be routed to a
specified collection could solve this?
Is this a broad issue others have encountered or are likely to encounter?
WDYT?
> TRA: Pre-emptively create next collection
> ------------------------------------------
>
> Key: SOLR-12357
> URL: https://issues.apache.org/jira/browse/SOLR-12357
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Major
> Fix For: 7.5
>
> Attachments: SOLR-12357.patch
>
> Time Spent: 9.5h
> Remaining Estimate: 0h
>
> When adding data to a Time Routed Alias (TRA), we sometimes need to create
> new collections. Today we only do this synchronously – on-demand when a
> document is coming in. But this can add delays as the documents inbound are
> held up for a collection to be created. And, there may be a problem like a
> lack of resources (e.g. ample SolrCloud nodes with space) that the policy
> framework defines. Such problems could be rectified sooner rather than later
> assume there is log alerting in place (definitely out of scope here).
> Pre-emptive TRA collection needs a time window configuration parameter,
> perhaps named something like "preemptiveCreateWindowMs". If a document's
> timestamp is within this time window _from the end time of the head/lead
> collection_ then the collection can be created pre-eptively. If no data is
> being sent to the TRA, no collections will be auto created, nor will it
> happen if older data is being added. It may be convenient to effectively
> limit this time setting to the _smaller_ of this value and the TRA interval
> window, which I think is a fine limitation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]