[ 
https://issues.apache.org/jira/browse/SOLR-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867248#comment-16867248
 ] 

mosh commented on SOLR-12357:
-----------------------------

Lately we have encountered Time series data, which is sometimes broken and does 
not have a date.
We have been planning on indexing the broken data using our indexing pipeline 
into a separate collection,
though this got us wondering, whether we could propose an improvement to TRA, 
and add this feature to its core logic.

Perhaps adding a new configuration for un-routable documents to be routed to a 
specified collection could solve this?
Is this a broad issue others have encountered or are likely to encounter?

WDYT?

> TRA: Pre-emptively create next collection 
> ------------------------------------------
>
>                 Key: SOLR-12357
>                 URL: https://issues.apache.org/jira/browse/SOLR-12357
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>             Fix For: 7.5
>
>         Attachments: SOLR-12357.patch
>
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> When adding data to a Time Routed Alias (TRA), we sometimes need to create 
> new collections.  Today we only do this synchronously – on-demand when a 
> document is coming in.  But this can add delays as the documents inbound are 
> held up for a collection to be created.  And, there may be a problem like a 
> lack of resources (e.g. ample SolrCloud nodes with space) that the policy 
> framework defines.  Such problems could be rectified sooner rather than later 
> assume there is log alerting in place (definitely out of scope here).
> Pre-emptive TRA collection needs a time window configuration parameter, 
> perhaps named something like "preemptiveCreateWindowMs".  If a document's 
> timestamp is within this time window _from the end time of the head/lead 
> collection_ then the collection can be created pre-eptively.  If no data is 
> being sent to the TRA, no collections will be auto created, nor will it 
> happen if older data is being added.  It may be convenient to effectively 
> limit this time setting to the _smaller_ of this value and the TRA interval 
> window, which I think is a fine limitation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to