[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Erick Erickson (Jira) Thu, 25 Jun 2020 16:28:54 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145918#comment-17145918
 ]


Erick Erickson commented on SOLR-10229:
---------------------------------------

I don't think so. I'm looking at just the mechanics of the editing/maintaining 
"schemas in code", not any failures from, say, problems with the managed schema 
API like SOLR-11034.

Let's claim that there are absolutely no functional issues with using the 
managed schema API. I've become skeptical that having a lot of canned schemas 
is worse than a bunch of code like this:

 
{code:java}
    List<JettySolrRunner> allJettys = new ArrayList<>();
    allJettys.add(controlJetty);
    allJettys.addAll(jettys);

    fac.addFieldTypesFromUber("collection1", allJettys, "tint", "text_mock");

    fac.addFields("collection1", allJettys, new String[][]{
        {"name", "int_i", "type", "tint", "indexed", "true", "multiValued", 
"false", "stored", "true"},
        {"name", "text", "type", "text_mock", "indexed", "true", "stored", 
"true"},
        {"name", "field_t", "type", "text_mock", "indexed", "true", "stored", 
"true"},
        {"name", "plow_t", "type", "text_mock", "indexed", "true", "stored", 
"true"}
    });

 {code}
We'd replace one line of code where we specify loading up a particular 
config(set) with N lines of code. Every test file that used this technique 
would then have a bunch of boilerplate like the above. And it's going to be a 
very large undertaking. For instance, just creating a catalog of _where_ all 
the config(set)s/schemas are used isn't trivial from what I can tell, much less 
actually transferring the information in the various schemas to the appropriate 
test files and coding it up.

This is a relatively simple case. What I like about the approach is that it 
keeps the fields you're testing in the code that's testing it. _Except_ the 
field _types_ are still somewhere else (Uber collection). If I want to add a 
different type I need to add that too, and the code to do that is much more 
complicated than a field based on an existing type.

I don't think it's worth the effort. If we're going to change how we do this 
I'd like to have a SIP and like us to do some serious thinking about how this 
would be a simpler process. Who knows? It might be the right thing to do to 
eliminate schemas somehow. I think effort spent on that kind of thinking is a 
better use of everyone's time than piecemeal changes like this, for all that I 
was enthused at one point.

> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10229
>                 URL: https://issues.apache.org/jira/browse/SOLR-10229
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-10229-straw-man.patch, SOLR-10229.patch, 
> SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, 
> SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [[email protected]] in particular might have 
> something to say.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Reply via email to