[ 
https://issues.apache.org/jira/browse/SOLR-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761902#comment-16761902
 ] 

Gus Heck commented on SOLR-13149:
---------------------------------

Pushed changes to the 
[branch|https://gitbox.apache.org/repos/asf?p=lucene-solr.git;a=shortlog;h=refs/heads/solr-13131].
 CRAs are now working internally, and has a unit test. This commit should be 
solid enough for SOLR-13152 to be started.

I found that any solution that allowed an alias to resolve to zero collections 
created many problems in CloudSolrClient, and these problems will almost 
certainly crop up pretty much anywhere else that tried to handle both aliases 
and collections. Thus I was forced to have a temporary initial collection and 
then remove the temporary collection once data driven logic creates at least 
one collection. I'm not that fond of this however because it creates extra 
logic that gets run for every document. An alternative is to add another 
initialization parameter for CRA's that accepts known categories and require 
that at least one category be declared in advance. This is less data driven but 
simpler for the code. Another way to handle it might be to cache the 
information that the initial collection has already been removed in zookeeper, 
thus reducing the per document logic to a simple check of a boolean. I think I 
favor the later solution because data-driven behavior core too this feature. 
The former solution might come back  as an enhancement anyway since some edge 
case usages (primarily focusing on easing indexing complexity) might want 
pre-declare the categories to avoid pauses during indexing due to collection 
creation (the analog of preemptive creation in TRAs), but that seems like a 
minor corner case that maybe we should wait for someone to request.

Another lurking issue not yet dealt with is what to do in the case of 
non-English categories. Our restrictions on collection naming are going to 
create problems for this use case. I expect that the only solution to that is 
to use RFC-4648 URL-safe base64 encoding of the data value when naming the 
collection. This would make the collection names opaque :( but legal :). 
Encoding would need to be applied after the cardinality/pattern checks 
(SOLR-13150 and SOLR-13151). Support for non-English categories can also be 
added as an enhancement in anther ticket.

I'll probably want to add some more unit tests and address the per document 
logic before considering this ticket finished.

> Implement a basic CategoryRoutedAlias class
> -------------------------------------------
>
>                 Key: SOLR-13149
>                 URL: https://issues.apache.org/jira/browse/SOLR-13149
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: UpdateRequestProcessors
>            Reporter: Gus Heck
>            Priority: Major
>
> This ticket will add the core functionality for data driven routing to 
> collections in an alias based on the value of a particular field by fleshing 
> out the methods required by the RoutedAlias interface. This ticket will also 
> look for any synergies with the existing TimeRoutedAlias class and reuse code 
> if possible. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to