[jira] [Updated] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Erick Erickson (JIRA) Sun, 23 Jul 2017 10:59:38 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erick Erickson updated SOLR-10229:
----------------------------------
    Attachment: SOLR-10229.patch

I think this is getting near final form, pending resolution of blockers. It's 
time to render an opinion of whether this approach has legs because I'll check 
it in when the blockers are resolved.

There are three cases to deal with (so far)
1> stand-alone, single cores
2> non-cloud, distributed
3> Cloud mode

This patch has an example of each converted to use the schema framework.

WARNING: When people start using this there'll be, I predict, a lot of Jenkins 
errors. Every test I've converted has has some assumptions that, due to 
randomization, may succeed for a while. Example:

DistributedQueryElevationTest:

sorted on int_i and examined the responses for the text fields. Unless and 
until one explicitly defined int_i as multiValued="false" and the text fields 
as stored="true" this would intermittently fail. It may be wise to beast tests 
as they are converted... or at least run them multiple times. The 
test-nocompile is very helpful for that... This may be particularly interesting 
with docValues and stored/not stored.

I think this is a +1 though as the coupling of the assumptions with the tests 
is much easier to track.

I'm not quite sure what to do with points .vs. numerics yet. We ought to be 
able to randomize the */t*/p* primitives, although we may need to override some 
of that for testing specific cases. Sysprop? Prefix on addTypes (e.g. ^tint 
means "must be a tint")?

> See what it would take to shift many of our one-off schemas used for testing 
> to managed schema and construct them as part of the tests
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10229
>                 URL: https://issues.apache.org/jira/browse/SOLR-10229
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>         Attachments: SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, 
> SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, SOLR-10229.patch, 
> SOLR-10229-straw-man.patch
>
>
> The test schema files are intimidating. There are about a zillion of them, 
> and making a change in any of them risks breaking some _other_ test. That 
> leaves people three choices:
> 1> add what they need to some existing schema. Which makes schemas bigger and 
> bigger and bigger.
> 2> create a new schema file, adding to the proliferation thereof.
> 3> Look through all the existing tests to see if they have something that 
> works.
> The recent work on LUCENE-7705 is a case in point. We're adding a maxLen 
> parameter to some tokenizers. Putting those parameters into any of the 
> existing schemas, especially to test < 255 char tokens is virtually 
> guaranteed to break other tests, so the only safe thing to do is make another 
> schema file. Adding to the multiplication of files.
> As part of SOLR-5260 I tried creating the schema on the fly rather than 
> creating a new static schema file and it's not hard. WDYT about making this 
> into some better thought-out utility? 
> At present, this is pretty fuzzy, I wanted to get some reactions before 
> putting much effort into it. I expect that the utility methods would 
> eventually get a bunch of canned types. It's reasonably straightforward for 
> primitive types, if lengthy. But when you get into solr.TextField-based types 
> it gets less straight-forward.
> We could manage to just move the "intimidation" from the plethora of schema 
> files to a zillion fieldTypes in the utility to choose from...
> Also, forcing every test to define the fields up-front is arguably less 
> convenient than just having _some_ canned schemas we can use. And erroneous 
> schemas to test failure modes are probably not very good fits for any such 
> framework.
> [~steve_rowe] and [[email protected]] in particular might have 
> something to say.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-10229) See what it would take to shift many of our one-off schemas used for testing to managed schema and construct them as part of the tests

Reply via email to